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Preface to the Third Edition 


For some forty years the first and second editions of this book have been 
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Preface to the Second Edition 


Twenty-six years have passed since the first edition of this book was pub- 


subject and a study of their properties. The general outline of topics has been 
retained. 

The method of maximum likelihood has been augmented by other consid¬ 
erations. In point estimation of the mean vector and covariance matrix 
alternatives to the maximum likelihood estimators that are better with 
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it procedures. New results on distributions 
given; some significant points are tabulated, 
such as power functions, admissibility, unbi¬ 
power functions, are studied. Simultaneous 
md covariances are developed. A chapter on 

ipter sketching miscellaneous results in the 

ncludina simultaneous eauations models and 


ationships, are introduced. Additional problems present 

o cover all relevant material in this book; what seems 
been included. For a comprehensive listing of papers 
;s until 1970 the reader is referred to A Bibliography of 
jl Analysis by Anderson, Das Gupta, and Styan (1972). 





























W 















Preface to the First Edition 


This book has been designed primarily as a text for a two-semester course in 
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CHAPTER 1 


Introduction 


1.1. MULTIVARIATE STATISTICAL ANALYSIS 

Multivariate statistical analysis is concerned with data that consist of sets of 

measurements on a number of individuals or objects. The sample data may 

be heights and weights of some individuals drawn randomly from a popula¬ 

tion of school children in a given city, or the statistical treatment may be 
made on a coUection of measurements, such as lengths and widths of petals 

and lengths and widths of sepals of iris plants taken from two species, or one 

may study the scores on batteries of mental tests administered to a number of 

students. . 

The measurements made on a single individual can be assembled into a 
column vector. We think of the entire vector as an observation from a 
multivariate population or distribution. When the individual is drawn ran¬ 
domly, we consider the vector as a random vector with a distribution or 
probability law describing that population. The set of observations on all 

individuals in a sample constitutes a sample of vectors, and the vectors set 

side by side make up the matrix of observations. 1 The data to be analyzed 
then are thought of as displayed in a matrix or in several matrices. 

We shall see that it is helpful in visualizing the data and understanding the 

methods to think of each observation vector as constituting a point in a 

Euclidean space, each coordinate corresponding to a measurement or vari¬ 

able. Indeed, an early step in the statistical analysis is plotting the data; since 

twhen data are listed on paper by individual, it is natural to print the measurements on one 
individual as a row of the table; then one individual corresponds to a row vector. Since we prefer 

to operate algebraically with column vectors, we have chosen to treat observations m terms of 

column vectors. (In practice, the basic data set may well be on cards, tapes, or disks.) 
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variability of sample covariances depending on fourth-order moments. 
This inflexibility of normal methods with respect to moments of c 



computation. Packages of statistical programs are available for most of the 
methods. 












































































The Multivariate 
Normal Distribution 


2.1. INTRODUCTION 


2.2 NOTIONS OF MULTIVARIATE DISTRIBUTIONS 


2.2. NOTIONS OF MULTIVARIATE DISTRIBUTIONS 


2.2.1. Joint Distributions 

In this section we shall consider the notions of joint distributions of several 

variables, derived marginal distributions of subsets of variables, and derived 

conditional distributions. First consider the case of two (real) random 

variables + X and Y. Probabilities of events defined in terms of these variables 

can be obtained by operations involving the cumulative distribution function 

(abbreviated as cdf), 

(1) = Pr{X<x, Y<y}, 

defined for every pair of real numbers (x,y). We are interested in cases 
where F(x, y) is absolutely continuous; this means that the following partial 
derivative exists almost everywhere: 


( 2 ) 

and 


d 2 F{x,y) 
d:c dy 


=f(x,y), 


( 3 ) 


F{x,y) = f J J(u,u)dudv. 


The nonnegative function f(x, y) is called the density of X and Y. The pair 
of random variables (X,Y) defines a random point in a plane. The probabil¬ 
ity that (X t Y) falls in a rectangle is 

(4) Pt{x<X<x + <y<_y + A_y} 

■ = F(x + Ax,y + ^y) -F(x + ^x,y) -F(x,> +A_y) +F(x,y) 

= j y + £iy J X + £iX f(u t v) dudu 

(Ax >0, A_y > 0). The probability of the random point {X,Y) falling in any 

set E for which the following integral is defined (that is, any measurable set 

E) is 

(5) ?r{(X,Y)^E}=jj i f{x,y)dxdy. 

+ In Chapter 2 we shall distinguish between random variables and running variables by use of 
capital and lowercase letters, respectively. In later chapters we may be unable to hold to this 


convention because 























icgrai trs me amu oi sums oi me 
variables，the probability element 
?ility that X falls between x and 
since 

-j^ y j^ X f{u,u)dudu 

= f(x 0 ,y 0 ) AxAy 

\y + A_y) by the mean value theo- 
⑹ is annroyimatelv f( i*. v'l Ar A v. 






(10) F(x u 

The probability ol 
Euclidean space is 




(11) Pr((A: 


cdf of X is 

(13) ?r{X<x) - Pt{X<x, y^oo} 

= F(x ) qo). 

Let this be F(x). Clearly 

(14) F ( x )= f JXu,ti)dvdu. 
We call 

(15) j^J(u,v)du^f(u), 
say, the marginal density of X. Then (14) is 

(16) F(x)=f_J(u)du. 


, X p , wc wish to find the marginal cdf of sc 
X v ...,X r {r < p). It is 

(17) ^{X i 

=Pr|<xj,..., X r ^x ri ^ r +\ ^ ( 
~f{^X^j,.. 7 X ri oo, ... ， oo) . 


Hi 




(20) F(x jy )=F(x)G(y), 

where FU) is the marginal cdf of X and G(y) is the marginal cdf of K This 
implies that the density of X, Y is 

(21) f(x y) = Hli^y) = ^x)G{y) 

( ' R X ^y) dx d y dx d y 

dF(x) dG(y) 
dx dy 

=/( 咖 (y). 

Conversely, if /U ， _y) =f(x)g(y\ then 

(22) F(x ， y)=f f f(u,v)dudv= f f f(u)g(u)dudv 

J -X J -CO J -X> J -CO 

=j j( u ) du du = F(x)G(y). 

Thus an equivalent definition of independence in the case of densities 


X l ,...,X p are mutually independent, then 
(26) "-X 卜 / 二 …/ 二 # … 々 A ⑷ … 以屯 ...dx p 

= n / x^fi( Xi ) dx, 

= n{^}. 


2.2.4. Conditional Distributions 


If A and B are two events such that the probability of A and B occurring 
simultaneously is P(AB) and the probability of B occurring is P{B)> 0, 
then the conditional probability of A occurring given that B has occurred is 
P(AB)/P(B). Suppose the event A is X falling in the interval [x u x 2 ] and 
the event 5 is 7 falling in [y Xi y 2 \ Then the conditional probability that X 
falls in [x v x 2 l given that Y falls in [y u y 2 l is 


L 八 LiiaL J laiio L/1 j /2 10 

( 27 ) Pr{^ ^ = PrUl p r ^ ； iV<^r" 2} 


/ 2 广 dudu 


f 2 g(u)du 


Now let y x y 2 + Ay. Then for a continuous density, 
(28) f y + eiy g(v)dv = g(y*)Ay, 





where y <y* <y + Ay. Also 


(29) j y + Ciy f(u,u)du=f[u,y*(u)] Ay, 

where y <y + Ay. Therefore, 

(30) Pr{^i <X<x 2 \y <Y<y + Ay} = J^ 2 ’[ 二 ’( 〜 *()“)] 也 . 


It will be noticed that for fixed y and (> 0), the integrand of (30) behaves 











































up to r 19 (_y). Harris and Soms (1980) have studied generalizations of (57). 


2.4. THE DISTRIBUTION OF LINEAR COMBINATIONS OF 
NORMALLY DISTRIBUTED VARIATES; INDEPENDENCE 
OF VARIAT ES; MARGINAL DISTRIBUTIONS 




















(3) niodlC-'l = ^ 

The quadratic form in the exponent of n(x|^t, X) is 

(4) Q=(x-^yi~ i (x-^). 

The transformation (2) carries Q into 

(5) G=(C->-jt)^-'(C-V-jt) 

=(C 一 ^ C^Cm.) 

= [C- | (^-Cjt)] ( X-'[C '(J'-C^)] 

= (_V —cwxc-ys-icMO — Cjt) 

=( 广 CjtHCSC'r 1 (广 Cjt) 

since (C -1 ) 1 = (CO" 1 by virtue of transposition of CC' 1 =/. Thus the 
density of Y is 

(6) n(C- 1 ^|jt,2)mod|C|- 1 

= (27r)- i/ ， |CXCr i exp[-|(^-Cjt) I (C2r) _1 (^-Cjt)] 

= 心 |CW). ■ 

Now let us consider two sets of random variables X v .. t , X q and 
X q+i ,...,X p forming the vectors 



1 

'叫 



⑺ 

尤 ⑴ = 


， x( 2 )= 



1 



i ^ 


These variables form the random vector 


( 8 ) 


，尤 ⑴) 

J (2 )J 


x P 


Now let us assume that the p variates have a joint normal distribution with 
mean vectors 

( 9 ) SX (X) = ^\ SX m = ^ (2> , 


UNBAR COMBINATIONS ； MARGINAL DISTRIBUT 













^DX, 


where Z has q components and D is a g X p real matrix. The expected value 
of Z is 

(31) 

and the covariance matrix is , 

(32) S{Z~D^){Z-Dii)^DXD\ 

The case q^p and D nonsingular has been treated above. If ^and D is 
of rank q y we can find a.(p-q)Xp matrix E such that 

(33) • (引= 0 

is a nonsingular transformation. (See Appendix, Section A.3.) Then Z and W 
have a joint normal distribution, and Z has a marginal normal distribution by 
Theorem 2.4.3: Thus for D of rank q (and X having a nonsingular distribu- 







defines a random vector V with covariance matrix (36) and a mean vector 


(38) sV=B\l = v= (二 j ， 

say. Since the variances of the elements of K (2) are zero, K (2) = v (2) with 
probability 1. Now partition 

(39) B' 1 =(C /)), 

where C consists of r columns. Then (37) is equivalent to 

(40) X = B 'V= (C =CF (1) +DK (2) . 

Thus with probability 1 

(41) + 

which is of the form of (34) with C sls A, K (1) as Y, and Dv a) as 












fHE MULTIVARIATE NORMAL DISTRIBUTION 


N(v,T) t we can write 


where DA is qX r. If the rank of DA is r, the theorem is proved. If the rank 
is less than r, say s, then the covariance matrix of Z, 


say, is of rank 5 . By Theorem A.4.1 of the Appendix, there is a nonsingular 


_l{F l DA)T(F l DA)' {F.DA^^DA)'\ [l s 0\ 

= \(F 2 DA)T(F l DA) 1 (f 2 A4)r(f 2 ft4)’ 广 ^0 0 厂 

Thus F X DA is of rank s (by the converse of Theorem A.1.1 of the Appendix), 
and F 2 DA = 0 because each diagonal element of {F 2 DA)T{F 2 DA)' is a 
quadratic form in a row of F 2 DA with positive definite matrix T. Thus the 
covariance matrix of FZ is (46), and 

(47) FZ = |^Jmr+FDX= +FD\, 

say. Clearly U x has a nonsingular normal distribution. Let 广 1 = (G x G 2 ). 


(48) 

which is of the form (34). 




备 2.5 CONDITIONAL DISTRIBUTIONS ； MULTIPLE CORRELATION 


33 


Y is constant on ellipsoids 

(49) (J - 1 (广 Cy^k. 

The marginal distribution of AT U, is the projection of the mass of the 
distribution of X onto the g-dimensional space of the first q coordinate axes. 











.2r/.2[( xU ) _ 〆 】)） 一 ^ 12^22 ( 尤⑵一 私 (2> )1}_ Theorem 2.5.1. Let the components of X be divided into two groups com- 

((1) , ( 2 )x posing the subuectors X {1) and X i2) . Suppose the mean |x is similarly divided into 

It is understood that consists oi p-q numbers. The density /U lx ), ami suppose the covanance matrix X of X is divided into 





















( 11 ) r(X^a'X^) 

= ^[ X i - f , i - oc '{ X ^-^)} 2 
= s[X}^ - 灯 厂 . 2 ) + (P (，厂 a)'(^ (2) - ^ (2) )] 2 
=r[z/ 12 >] + (P (0 - a)-<f(^ (2) - m- (2) )(^ (2) - 
=r(Z/ 12) ) + (p a) - a VS 22 (P (i ) - a). 


Since 2 22 is positive definite, the quadratic form in p {0 
and attains its minimum of 0 at a = p ⑴. ■ 





















It follows that this is 


(15) 


^iq+\ 





V 0r U')^22 l °'(.) 


y/fr{i)^22 


A useful formula is 


(16) 


1 -巧 +1 


°)「 ^(1)^22 ^ji) 



where Theorem A.3.2 of the Appendix has been applied to 


(17) 


Since 

(18) 

it follows that 
(19) 


°)i a (0 

° ■⑴ ^22 

^/■g+l.-.-.p = a ii ~ or (0^22 1 ° r (0» 

.....p = (! -巧, + 1 .…. 



This shows incidentally that any partial variance of a component of X cannot 
be greater than the variance. In fact, the larger Ri， q + l p is, the greater the 


iuction in variance on going to the conditional distribution. ’ 
other reason for considering the multinle correlation coefficient 






洛 arm correianons can oe aennea as tne covariances 

siduals yielding (3) and (8). Then these quantities do 

iterpretations in terms of conditional distributions. In 

ii + P 0) (x {2) - jji (2) ) is the conditional expectation of 
out regard to normality, X 「is uncorrelated 
f( 2 ) ， SX t \X {2) minimizes S[X t -h{X {2) )f with respect 
X {2 \ and maximizes the correlation between 

(2) . (See Problems 2.48 to 2.51.) 

for Partial Correlations 

ions between several conditional distributions obtained 
'erent sets of variates fixed. These relations are useful 


this i 


.generaliza- 


尤⑴ 
















；) ^ (3) ]=S n .3 

1 calculation 






































contoured distributions with density 


( 2 ) [(… 

whare 八 is a nnsitive definite matrix. 






















where - ^7r< 0 ( - < / = l ， ... ， p-2 ， -7T< d p _ x < it, and 0 ^r < co. 

Note that y'y = r 2 . The Jacobian of the transformation (4) is 
厂广 1 cos p ~ 2 6 } cos p-3 0 2 "，cos 0 p _ 2 - See Problem 7.1. If giy^) is the density 
of y, then the density of is 

(5) r p_1 cos p ~ 2 h cos p_3 0 2 … cos 0 p _ 2 g(r 2 ), 

Note that i?,© p _ t are independently distributed. Since 

⑹ r /2 cos" 

(Problem 7.2 )， llie marginal density of R is 

(7) 咖沁. 2 )，- 1 ， 

where 


( 8 ) 

C(P ： 


2-> 

W) 

/：/：；• 


' 厂 /2 cos p ~ 2 0 l cos p ~ 3 0 2 ...cos 〜 -2^ … dO p _2 de p _y 


The marginal density of is r[|(p -/)]cos p - 1-1 沒 /{r ( 去 ) r[+(;? — i — 1 )]}， 
!_= 1 ， … ， p-2, and of 6» p _i is 1/(2tt). 

In the normal case of MO, I) the density of Y is ’ 

g{y'y) = (27r)' a， exp(-53» ， 3»), 

and the density of « = (ry)l is r p - ] exp (- \r 2 )/[l'^ r(^)]. The density 
oi r 2 = u is t)5 p_1 e' ^/[2^T{\p)]. This is the /-density with p degrees of 
freedom. 

The constant C(p) is the surface area of a sphere of unit radius in p 
dimensions. The random vector U with coordinates sin © 1? cos ©j sin © 2 >... ， 








/ [<f (A')] for all c > 0, then fU 


(入 (P. 


2.7.2. Distributions 






has the density ICAC'I ~ 士 g[(x - Cv) , (CAC , ) _1 (x - Cv)] for C nonsingular. 

The generalization of Theorem 2.4.4 is the following: If X has the density 
(2)，then Z = DX has the density 

(20) iDAD'l-^liz-DvYiDAD'y'iz-Dv)], 





e i, ，0 y g(y'y)dy l … 办 p 


2.7.3. Conditional Distributions ana Multiple ， 

The density of the conditional distribution of 力 given y 2 when y = y' 2 )' 

has the spherical density g(_y’_y) is 

g(/i_vi +y' 2 y2) = s(y\yi + r 2) 

^ gj^yi) 幻( 4 ) ’ ’ 


(25) 


where the marginal density i s given by (17) and rl =y 2 3 , 2- I n terms 

of y u (25) is a spherically contoured distribution (depending on r\). 

Now consider X=(X[,X' 2 y with density (2). The conditional density of 
AT(D given X (2) =x (2) is 多 


(26) 


|A n . 2 r^g{ [(: c。) - V ⑴) ' _ (X® _ v (2 ))，B W. 2 [x (1) _ v ⑴- BO( 2 ) - v( 2 ))] 
+ (J ； (2) _ v (2>y A22(^® - v (2) )} 


： |A n . 2 | ^g{[x (1 >- v w -B{x m - v m )\\~,l 2 \x w -v m -Bix^ - v^+rl) 

+ g2( ， .2 2 ), • 


where r\ = (x (2) - v ®) 1 - v (2) ) and B = A 12 A^ 1 . The density (26) is 

elliptically contoured in x (1) - v (1) - Mx( 2) - v( 2 )) as a function of x m . The 
conditional mean of X (1) given X (2) — x (2) is 


(27) 


<?(X ( 1 ) lx (2) ) = v (1) + B(J： (2) — v (2) ) 


if ^(«?|y 2 ^ 2 = r 2 2 ) <00 in (25), where /? 卜 d Also the conditional covari¬ 
ance matrix is (^4/q)A n 2 . It follows that Definition 2.5.2 ofjhe partial 

一 一 "il2^22 ^21 


correlation coefficient holds when = S u . 2 = S u + X t 


and S is the parameter matrix given above. 

0^1 0 ^^ and 2.5.4 are true for anv elliptically contoured 


where 
nal O 


found 
tion ‘ 




(33) 




f e'^giz'z) dz x **• dz p 

j-<X) 


has the density g(y f y)^ The equality (28) for all orthogo- 
z is a function of ft. We write 


: 也 （m 


y. Conversely, any characteristic tunction oi me iovm 
jrresponding to a density corresponds to a random vector X 
: y ⑵. 

lts of X with an elliptically contoured distribution can be 
e characteristic function e H ^(t'U) or from the representa- 
?Cl/, where C A 'C = /. Note that 

SR 1 = C(p) j r p+l g{r 2 ) dr= -2p<l)'{0), 

在 R 4 = C ( p ) 广/^ + 3 茗( 厂 2 ) 办 = 4 P ( P + 2 ) 沴"⑼ . 

tie higher-order moments of y = /?l/. The odd-order moments 
id hence the odd-order moments of F are 0. 

SiX,- tx. i ){X r iMj)(X k - 叫 ）= 0 . 

oments of X- jji of odd order are 0. 

Because l/ ( l/= 1, 


SU^=pSUt +p(p~l) ^uM- 






HE MULTIVARIATE NORMAL DISTRIBUTION 

[/4 = 3/[^(^ + 2)]; then (34) implies 
^ = ?,£R A /[p{p + 2)\ and = 

:l or / = / ¥= A: = / or i = k ♦卜 I or 
).To summarize SU i U j U k U l = {^ ij ^ki Jr 
h-order moments of X are 

.X k - 卜 k )(Xi_ ft!) 

入 A / + 入汄入 "+ 入"入 M ) 

y(o- lJ (r J(/ + o- ik (rj, + 


standard deviation is 

p^ +2) \ p -l 

」 ( 字） 

= V/r -) 2 ” 2 1 1 

— 3k, 

sav . This .s known as the kurtosis. (Note that k is \S{{X,~ ^/ 

— 1.) The standardized fourth cumulant is 3k for every 
component of X. The fourth cumulant of X t , X jt X k , and X, is 

(37) 

K ,,,, = A X, — ^)(X rh ){ X k - l x k ){X l -^ l )-^ j cr kl + a ik a jl + a it <T jk ) 

=k ( a u a k i + a ik a j{ + (t u a jk ). 

For the normal distribution k = 0. The fourth-order moments can be written 

(38) — PiK X j- ^i) 

=(1 + K)(a tj a kl + 0-,a-O}/ + (T i{ (r jk ). 

More detail about elliptically contoured distributions can be found in Fang 
and Zhang (1990). 


















R 2 \\Y \\ 2 _ _mxl 


If X=\l^CY, the density o r X is 


(41) 以 Al 


>-|^) 


(2) Contaminated normal. The contaminated normal distribution is a mix¬ 
ture of two normal distributions with proportional covariance matrices and 
the same mean vector. The density can be written 


fi) (27T) P/2 | 


where c > 0 and 0 < £< 1. Usually s is rather small and c rather large. 


























CHAPTER 3 


Estimation of the Mean Vector 
and the Covariance Matrix 


3.1. INTRODUCTION 


3.2. THE MAXIMUM LIKELIHOOD ESTIMATORS OF THE MEAN 
VECTOR AND THE COVARIANCE MATRIX 


Given a sample of (vector) observations from a p-variate (nondegenerate) 
normal distribution, we ask for estimators of the mean vector 蚪 and the 











of squares and cross products of deviations about the mean be 

N 

(4) . A= H (x a -x)(x a -x)' 

a_l 

= I U=1 ，…， P . 

It will be convenient to use the following lemma: 

Lemma 3.2.1. Let x v ... y x N be N (p<omponent) uectors，and 
defined by (3). Then for any vector b 

(5) &(*«-*)(〜-*)’= E (x a -x)(x a -x)' +N(x-b){x 

Proof 

( 6 ) 

E(x a -*)(x a -*y= £ [(x„-x) + (x-*)][(x a -x) + (x- 

a-1 a-1 

a-1 

=E [ E (^-^)1( 

a-1 L a-1 J 

N 

+ (x-ft) E (x a -x)'+N(x-b)(x- t 

a-1 

The second and third terms on the right-hand side are 0 because E(j 
Lx a -Nx=0by(3l ■ 

When we let b = jt*, we have 

(7) 

(*«-!»•*)(*«-^*) ，= E i (*a-i)(* a -*) < +N(x-^*)(i 
=A+N(x-p*)(x-t>.*y. 



322. IfDis positive definite of order p, the i 


(10) /(G)=N|log|G| -trG- 1 © 

with respect to positive definite matrices G exists, occurs atG = (.1/N)D, and 
has the value 

⑼ / [(1/^)D] =pN\ogN-N\og\D\ -pN. 









estimators of <f) m are unique. 


Corollary 3.2.2. If x x ， … ， x N constitutes a sample from where 

cr" = (TiO-jPij (pa = 1), then the maximum likelihood estimator of is \i^x = 
(1/NyZaX^; the maximum likelihood estimator of is &- 1 = (l/N)L a (x ia - 
x^ 2 = (l/N){L a xf a - Nxfl where x ia is the ith component ofx a and、is the 
/th component ofx; and the maximum likelihood estimator of p i} is 

八 ^a=l( X ia~^i)( X jct —无 j.) 


Proof. The set of parameters 
one-to-one transform of the se 
Corollary 3.2.1 the estimator of 






(18) Pa = 


Pearson (1896) gave a justifi 
sometimes called the Pearson 
simple correlation coefficient. It i; 






















e a joint normal distril 
t of linear combinations 
joint normal distributioi 


)e defined later.) 

THE SAMPLE MEAN 


covariance matrix oeiween 

€(Y at Y；) = ^(Y a - 

〃[£( 

L/3=i 

N 

■ - E c ( 

/3,右 =1 
N 

=E ^ 

/3,右 =1 
N 

=E C afi 
J3=l 

= 8 ayS, 


8 is the Kronecker 










following general lemma: 








3.3 THE DISTRIBUTION OF THE SAMPLE MEAN VECTOR 
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has a (central) /-distribution with p degrees of freedom. This is the 
fundamental fact we use in setting up tests and confidence regions concern¬ 
ing M- 

Let a) be the number such that 
(13) Pr{^ 2 >^ 2 (a)} = a - 

Thus 


(14) 


Pr{iV(X- ^.)- 2 - 1 (X-|x)> ^ 2 (a)} 







































J.4. THEORETICAL PROPERTIES OF ESTIMATORS 
3F THE MEAN VECTOR 

$.4.1. Properties of Maximum Likelihood Estimators 

[t was shown in Section 3.3.1 that x and S are unbiased estimators of and 

C Tn thic cnKcprtinn wp chall ^how that y flnH S arp sufficient 








tain the power function of the test (15)，we note that yfN(X~ |x 0 ) 
iistribution N[}fN (|i - |i 0 ), X]. From Theorem 3.3.3 we obtain the 
;corollary: 

lary 3.3.1. If X is the mean of a random sample of N drawn from 
then N(X- p 0 )'X 一 *( 无 -|i 0 ) "as fl noncentral x 2 ~distribution with p 




gative and h(y) does not depend on 0. 




■ 

WBm 


，叫 

i) + (N_l)trX 一 1 S]}. 

5, |i, 2, and the middle 
each case h(x u "., x N ) 









Completeness 

To prove an optimality property, of the T 2 -test (Section 5.5)，we need the 

result that fx. 5) is a comnlete sufficient set of statistics for (il. X). 










called a co 








[Si ihTSmi iT5aS^TiTT»j i5MT»MC5iiiTK»ti iwtiiTS 
»i iw7^^^^Jl^CT5»KftiiTiiT5MBT»T»vM miTwiBiiBa W^i^3aiF*B/MMl 























(10) 财 （t 一 d)(i 

is positive semidefinite. (Other 














rf ^ givens is fix\6), the joint density of X and 9 is /(，l e)p(e)；nd the 
average risk of a procedure Six) is 


( 20 ) 


( 21 ) 


r(p,s) = jjL[e,s(x)}f{x\e) P (e)dxde 
=1 0 ， s ⑺ ] g( 

. ,, 、 f(x\e)p(e) 

f{x) = jj(x\e) P (9)de, g(ek) = ― 7(7) — 

… „ 一 1 riPn^itv nf A" and the a posteriori density of B given .v. The 































3.5 IMPROVED ESTIMATION OF THE MEAN 
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Let (23) be d(x\ Its average risk is 
(29) 

= 4tre[o-4»(4»+^2) 

= tre(/ + lxo- ； "lx-itrex 

as 0> _1 - >0. ■ 

For more discussion of decision theory see Ferguson (1967). DeGroot 
(1970), or Berger (1980b). 

3.5. IMPROVED ESTIMATION OF THE MEAN 
3.5.1. Introduction 

The sample mean x seems the natural estimator of the population mean \x 
based on a sample from N(\x, 2). It is the maximum likelihood estimator, a 
sufficient statistic when S is known, and the minimum variance unbiased 

estimator. Moreover, it is equivariant in the sense that if an arbitrary vector v 

is added to each observation vector and to |x, the error of estimation 
(x + v) - (|a + v) = x - |a is independent of v; in other words, the error 
does not depend on the choice of origin. However, Stein (1956b) showed the 

startling fact that this conventional estimator is not admissible with respect to 

the loss function that is the sum of mean squared errors of the components 

v/hen 2 =/ and p>3. James and Stein (1961) produced an estimator which 

has a smaller sum of mean squared errors; this estimator will be studied in 

Section 3.5.2. Subsequent studies have shown that the phenomenon is 
widesnread and the imnlications imnerative. 








inadmissible 


























Proof of Lemma. We write i he left-hand side of (5) as 


( 6 ) f e [f{x)-m\{x-e)^e-^dx 

+ j e Jf{x)-f(e)](x-e)^e-^-dx 

-/If，， ⑺ ( x - 0 )去^一_ 

= /:(> ⑺(㈠)去， 

-/_!/:/ ⑺(卜 办， 

which yields the right-hand side of (5). Fubini’s theorem justifies the inter¬ 
change of order of integration. (See Problem 3.22.) ■ 

The lemma can also be derived by integration by parts in special cases. 

Proof of Theorem 3,5.1. The difference in risks is 


(7) = ^{liy-JJill 2 - \\m(Y) - HI 2 } 

=+ -叫 |(1 -綠 )( 卜 v )+ 〜|} 


= 4 綠如—■一 Vi)_ 


(p-2) 2 \ 

IIF-v|| 2 j 








































Theorem 3.5.3. Let d z), 0 < ; < o=, be a nonclccreasing differentiable func¬ 
tion such that 0<r(z)<2(p-2). Then forp>3 

nu r(NHx- v yi-'Q-^\x-v)) A 

(21) m=y- Ar(3E _ v) - 2 -i e ->X->(x-v) U ) ( 

has smaller risk than x and is minimax. 

Proof. There exists a matrix C such that C'QC = / and (1/AOS = CAC, 
where A is diagonal with diagonal elements S, > S 2 > … >S p >0 (Theorem 
A.2.2 of the Appendix). Let x = Cy + v and |i = C|i* + v. Then y has the 
distribution MP ， A), and the transformed loss function is 

(22) i/ 1 (m' ) = (m* _ Ji”，( m* - ja” = IWII 2 . 

The estimator (21) ol' |x is transrornicd lo the estimator of ji* = - v), 


We now proceed as in the proof of Theorem 3.5.1. The difference in risks 
between ^ and m* is 


jt*l| 2 -|lm*(F)- |x*H 2 } 

Since dz) is differentiable, we use Lemma 3.5.1 with (x - 61) = (y, - )S, 

and 

r(y'X~ 2 y) 

( 25 ) /(y,) = 

/ 2 r'(y'\- 2 y) yf 2 r(y'A~ 2 y) yf 

㈤ n^ = J ^r + -2 厂歹一 S, 2 _ 







(33)-4= EW £ 袅 <V[W( ;) ) -Ml 2 

= L^L^wm -^*] 2 

j=i t=i 



< E ay ； = E (l* 

;-l ;=1 

=WK)*, 


and hence the estimator defined by (32) is minimax. 


Since the expected value of G ,. ⑺ with respect to (32) is (31) and the loss 
function is convex, the risk of the estimator (31) is less than that of the 


randomized estimator (by Jensen’s inequality). 


3.6. ELLIPTICALLY CONTOURED DISTRIBUTIONS 
3.6.1. Observations Elliptically Contoured 

Let jc ， •" ， be AM = « + 1) independent observations on a random vector 

. . I . I - i r/ \/ * - l ^ 丁 Up H^ncin? of thp <;amnle is 





(6) S E (x ia - IX;) { x ja - My ) ( A-,P - ^ ) ( i' (/J - ) 

a, 卩 =1 

= NS(x ia - iXi){x ja - h)(x k „- 〜）（〜_〜） 

+ N(N-i) ^(x, a - n,){x ja - lx；) nx kt} - P-k)( x m- ^ 
=AT( 1 + K ) ( % % , + (7 ,、，+ a,,a jk ) +N(N -1)(7,^,,, 









































ured distributions. 

ient set of statistics that is translation-invariant, 
function of S. Thus inference concerning X can 

)be a vector-valued function of X (Nxp) such 


Table 3.3^ Head Lengths and Breadths of Brothers 


Head 

Head 

Head 

Head 

Length, 

Breadth, 

Length, 

Breadth, 

First Son, 

First Son, 

Second Son, 

Second Son, 

文 l 

义 2 

^3 

x 4 

191 

155 

179 

145 

195 

149 

201 

152 

181 

148 

185 

149 

183 

153 

188 

149 

176 

144 

171 

142 

208 

157 

192 

152 

189 

150 

190 

149 

197 

159 

189 

152 

188 

152 

197 

159 

192 

150 

187 

151 

179 

158 

186 

148 

183 

147 

174 

147 

174 

150 

185 

152 

190 

159 

195 

157 

188 

151 

.187 

158 

163 

13，_ 

161 

130 

195 

153 

183 

158 

186 

153 

173 

148 

181 

14；i 

182 

146 

175 

140 

165 

137 

192 

154 

185 

152 

174 

143 

178 

147 

176 

139 

176 

143 

197 

167 

200 

158 

190 

163 

187 

150 


f These data, used in examples in the first edition of this book, came from Rao 
(1952), p. 245. Izenman (1980) has indicated some entries were apparently 
incorrectly copied from Frets (1921) and corrected them (p. 579). 


eight (in grams)! [Data from Fisher (1947b).] 

j) In a sample of 47 female cats the relevant data are 

Tr _tn0.9) . , = ( 265.13 1029.62\ 

么心― 1 4^2.5 j 1 么 ( 1029.62 4064.71 j " 





























































FA BIVARIATE SAMPLE 


of x a and 
1 N 

= 刃 E X ia ， 

a=* 1 

hp. distribution of r, ; when the population 




























\2 CORRELATION COEFFICIENT OF A BIVARIATE SAMPLE 
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Write = i = 1,2, to denote random vectors. The condi¬ 

tional distribution of Z 2a given Z- [a =z Xa is N( /3z la , <r 2 ), where /3 = pc 2 /cri 
and a 2 = cr 2 2 (l - p 2 ). (See Section 2.5.) The density of V 2 given V x = v x is 
N(pv lt a 2 I) since the Z 2 „ are independent. Let b = V[v i /v\v i { = a 2i /a n \ 
so that bi/ x {V 2 -bv^)^Q, and let U = (V 2 - bv ] ) , (V 2 - bv{) = V{V 2 - b 2 i/ l i^ l 
{ = a 22 ~a\ 2 /a u ). Then cot 6 = b^a n /U. The rotation of coordinate axes 
involves choosing an nXn orthogonal matrix C with first row {\/c)v\, where 
c 2 — v\v x . 

We now apply Theorem 3.3.1 "ith X a = Z 2a . Let Y a = 'L 0 c a0 Z 20y a = 
1,..■，/!• Then Y u ,,.,Y n are independently normally distributed with vari¬ 
ance a 2 and means 

(10) ^7,= Ec 1t j8z 1t =| Ezf^jSc, 

7=1 ?=1 

(11) iY a = E c or /3z lT =/3c E c ay c ly =Q, a#l. 

We have b = E , ,UiZ 2 „z 1 , [ /E ； Ui = cZl =l Z 2:r c Ul /c 2 = YJc and, from 

Lemma 3.3.1, 


( 12 ) 


"=EzL-" 2 EzL= Ed 2 

a =I a =1 a =1 



which is independent of b. Then U/a 2 has a /-distribution with n —1 
degrees of freedom. 




























4.2 CORRELATION COEFFICIENT OF A BIVARIATE SAMPLE 
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a = 1,..., and r is essentially the only invariant of the sufficient statistics 
(Problem 3.7). The above procedure for testing H:p^p 0 against alterna¬ 
tives p > p 0 is uniformly most powerful among all invariant tests. (See 
Problems 4.16, 4.17, and 4.18.) 

As an example suppose one wishes to test the hypothesis that p = 0.5 
against alternatives p=^0.5 at the 5% level of significance using the correla¬ 

tion observed in a sample of 15. In David’s tables we find (by interpolation) 
that F(0.027| 15,0.5) = 0.025 and F(0.805| 15,0.5) = 0.975. Hence we reject 

the hypothesis if our sample r is less than 0.027 or greater than 0.805. 

Secondly, we can use David’s tables to compute the power function of 

a test of correlation. If the region of rejection of H is r>r x and r<r\, 

the power of the test is a function of the true correlation p, namely 

[\-F(r l \N,p) + [F(r , 1 \N,p)\; this is the probability of rejecting the null 
hypothesis when the population correlation is p. 

As an example consider finding the power function of the test for p= 0 
considered in the preceding section. The rejection region (one-sided) is 
r> 0.5494 at the 5% significance level. The probabilities of rejection are 
given in Table 4.1. The graph of the power function is illustrated in Figure 
4.2. 

Thirdly, David’s computations lead to confidence regions for p. For given 
N, r[ (defining a significance point) is a function of p ， say / 】（ p)，and r, is 
another function of p, say / 2 ( p), such that 

(44) Pr{/i( p) <r</ 2 (p)lp} = l-«. 

Clearly, f x (p) and / 2 ( p) are monotonically increasing functions of p if r, 
and ri are chosen so l-F( ri \N,p)=\a = F^[\N,p). If p=/T'(r) in the 


=E=i 






























occurs 


(49) 


yfa^ / The concentrated likelihood is 

i [ 


(2^(1- P #W 


J (l-Po 2 ) 


the maximum of (49) occurs at 


(50) 


(7 2 = 


a l a h( l -Po r ) 

n(i-pI) 


The likelihood ratio criterion is, therefore, 


眶 “ (l-p^f(l-r^)^ [(l-p 0 2 )(l-r 2 )' 
max n L (l- Po r) w (1-PoO'. 


st is (1 - PqXI - r 2 Xl - P 0 r) -2 < c, when 




J. The Asymptotic Distribution of a Sample Correlation Coetticient 
Fisher’s z 

his section we shall show that as the sample size increases, a sample 


i| I |,,l i Ml k^M 


、 ' " yfcMqM ’ 

where C gh (n) =A f , h (n)/ ^cr gg cr hh . The set C^n), C ;7 (n), and C 1; (n) is dis¬ 
tributed like the distinct elements of the matrix 



Cu(n) \ 


















A random sample of size N drawn from this finite population has a probabil- 


that it is not necessary to assume knowledge of the parent population; 
a disadvantage is the massive computation. 


43. PARTIAL CORRELATION COEFFICIENTS; 
CONDITIONAL DISTRIBUTIONS 


43.1. Estimation of Partial Correlation Coefficients 


4.2 PARTIAL CORRELATION COEFFICIENTS 
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maximum likelihood estimator of 2 is (\/N)A, where 




a=l 



^12 

^21 ^22 

and ^ = = x {2), y. The correspondence between X and 

S n . 2 , p, and S 22 is one-to-one by virtue of ⑵ and ⑶ and 


(5) ^12 = P S 22> 

(6) s u =x u . 2 + ps 22 p’. 

We can now apply Corollary 3.2.1 to the effect that^ maximum likelihood 

_ J? _ J.Z __ KO m£>t£>rc Q TP thOQP. ft 1 H ntiDT1S Of th£ 1113X111111111 
















































ibution of a sample partial correlation . p based on a 

from a distribution with population correlation p j7 . (?+1 p 





















covariances 


x^ a and x { ^ is proportional to 


⑻ £ [(,v,„ - J.) - =a ， U) - = °- 

a=l 

The right-hand side of (7) can be written as the left-hand side plus 

(9) f ： [( 卜 4K 2) -#)r 

a= 1 

= (p-a)' £ (€>-#)(#—#) 作 -4 

a~l 


which is 0 if and only if a = P . To prove the third assert.on we cons.dei he 
vector a for which - 户 )] 2 = E ■: UP 化 2) _i (2) )] 2 , «nce the 

correlation is unchanged when the linear function is mulitphed by a positive 
constant. From ⑺ we obtain 

( 10 ) L [ P '( 4 2) -^ (2) )] 2 

«=i ‘ , = l 

<«„-2 l ：(^ ( I -5 1 ) fl ， ( A ：«- i (2) ) + £ kK )-’ 2 ))] 2 ， 


from which we deduce 

DK 2 )- 


«；„P 



which is (5). _ 

Thus .v, + -x( 2) ) is the best linear predictor of j： la in the sample, 

and p^^'is the linear function of jc^ 2) that has maximum sample correlation 


and the length squared of the first vector is E^ = 
the cosine of the angle between the first vector 











B 


id linear combinations of the components of x^ } cor: 
ic property that R is the cosine of the smallest angle 
components x ]a ~x l and a vector in the hyperplane 
_ 1 vectors. 

ric interpretations are in terms of the vectors in the 






u will oe convenient to reier to tne mumpie correiauoi ue 

z ia as the multiple correlation without subtracting the means. 

The population multiple correlation R is essentially the 

the parameters \x and X that is invariant under changes of 

of scale of X u and nonsingular linear transformations 
transformations X*^cX } +d, X (2) * = CX^+d. Similarly, 1 
pie correlation coefficient R is essentially the only function 


4.4.2. 


















The likelihood 


linates Let 

he null hypothesis. 





152 




by (30 )，where R is the sample multiple correlation coefficient defined by (5). 

As an example consider the data given at the end of Section 4.3.1. ’ 
sample multiple correlation coefficient is found from 

30 0.80 -0.40 

30 1.00 -0.56 

40 -0,56 1.00 .... 

1.00 -0.56 

-0.56 1.00 


Thus R is 0.802. If we wish to test the hypothesis at the 0.01 level that 



(31) 1~R 2 - 


1 

广 12 

〜 3 


r 21 

1 

广 23 


^31 

r 32 

1 













[see (18)] ’ 

R 2 N-p P’4 22 P N-p 

\-R 2 ' P~^ a n . 2 ' p-1 




















: MULTIPLE CORRELATION COEFFICIENT 

ie distribution of Zi 2 ) is MO, U，the distribution of 




r[f(iV-l)+a] 


(p-l)ff N - u+a 

N-p 


is {df=[(N~p)/(p~~ 1)1(1 - 


w ^p-0 + a-l T ^ N _ 1) +a ] 


a!r[;(p - 1) + a] 


^ed to multiply (38) by the density 
of W and Z 产， … ， Z? and then 
)btain the marginal density of W. 




Thus (p ^ 22 P/°'ii- 2 Vl._^ A 1 -i? )] has a /-distribution with n degrees ( 


freedom. Let 


■R 2 )= 0. Then ^ ， A 22 ^/a u . 2 = 0^ rt 2 . We compute 


(41) fe- 叫年 ) 




e-^ u du 




2 in r(U) 

V _r(jn + a) 


2^r({n) 

u {n + a-\ e - |(l + ^)u 


(14 - c^y n + a r(|n) J 0 2^ + tt r(^ + a) 
<t> a r (^ + a) 


^ n+a ~ l e~i u du 


(1+ 办)卜 r(|n) 

Applying this result to (38), we obtain as the density of R 2 


⑴、 - 炉 # f ( 炉广 ㈤ 严 - 1 … —V^t + zx) 

、 J rlK«-p + i)]r(i«) f ： 0 M !r[Kp-i) + M]. 


Fisher (1928) found this distribution. It can also be written 

J n 


(43) 取 n^jT^-Dl W-W 严 - 


■F[\n,\n-l{P-^y,R 2 R 2 \ 


where F is the hypergeometric function defined in (41) of Section 4.2. 


攀 



















Theorem 4.4.7. On the basis of observations x x ,...,x N from X), of 
a ll tests ofR = 0 at a given significance level with power depending only on R，the 
test with critical region given by R greater than a constant is uniformly most 
powerful. 

Theorem 4.4.7 follows from Theorem 4.4.6 in the same way that Theorem 
5.6.4 follows from Theorem 5.6.1. 

4.5. ELLIPTICALLY CONTOURED DISTRIBUTIONS 
4.5.1. Observations Elliptically Contoured 

Suppose x { ....,x N are N independent observations on a random p-vector X 
with density 

(1) |A「^ [(: c-vVA-V-v)]. 

The sample covariance matrix 5 is an unbiased estimator of the covariance 
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fient OJ JKS) nan u nurucnj “i 刀 c ，j ^ • … -rr - — , 

sample 介⑽⑴ such that < oo . Then 

(3) 灰 [/ ⑷ _/ ⑷] = 4^ 灰 ( 卜 0 ^ 0 〆 1 ) 

+k)(x ® x)+K oror ， H^^)}' 

Corollary 4.5 丄 If 

(4) /(«) =f( s ) 

f or all c>0 and all positive definite S and the conditions of Theorem 4.5.1 hold, 
then 

(5) -/(or)] + 









S 22 A I p _ x . Furthermore, for kj¥=l and ; = / = 1, Lemma 3.6.1 gives 

( 13 ) ^(D s (i) = + 

Theorem 4.5.2. Under the conditions of Theorem 4.5.1 

( 14 ) ]] TT^ s m^ N ( 0 > r p-^)- 

Corollary 4.5.4. Under the conditions of Theorem 4.5.1 

W — ^ 0) S^s 0) d 

( 15) 1 + K (1 + K)S U 

4.5.2. Ellinticallv Contoured Matrix Distributions 




n the Appendix (Section A.5.1). This 
in Section 3.2; see Figure 3.1. See also 


ir, U'U = I p , and V'V=TT'. The last 
= 0, / < y, can be solved uniquely for T. 
le restrictions). 

尺 （ FT)、and let O s be an orthogonal 










The space of U satisfying U'U = I p is known as a Steifel manifold. The 
lability measure of Definition 4.5.1 is known as the Haar inuanant 
Mon. The property 1/^1/for all orthogonal 0 N defines the (nor¬ 
malized) measure uniquely [Halmos (1956)]. 

Theorem 4.5.4. If Y {NX p) has the density g(FT), then U defined by 
y=UT', U'U = I p , t u > 0, i = l,...,p, and t u = 0, i <j, is uniformly dis- 
tributed on O(NXp). 

The proof of Corollary 7.2.1 shows that for arbitrary g(-) the density of 
r is 

(2i) fi{c[KN+i-o]fr}g( tr3T， ). 

where C(-) is defined in ⑻ of Section 2.7. , 

The stochastic representation of Y {NXp) with density g(Y F) is 


( 22 ) 


Y=UT\ 


The condition (24) of Corollary 4.5.5 is that f(X) is invariant with resp( 
to linear transformations X — XG. 

The density (18) can be written as 

(25) |Cr'g{c- 1 [^+JV(i-v)(i-v)-](C-)- 1 }, 

which shows that A and x are a complete set of sufficient statistics : 
A = CC’ and v. 


PROBLEMS 
4.1. (Sec. 4.2.1) Sketch 


k N(r) = 


r[|(7V-l)] 2 |(N- 4 ) 


for (a) TV = 3, （ b) TV = 4, （ c) N=5 3 and (d) N = 10. 
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tions in a sample of 140 are [Kelley (1928)] 

'1.0000 0.4248 0.0420 0.0215 0.0573' 

0.4248 1.0000 0.1487 0.2489 0.2843 

0.0420 0.1487 1.0000 0.6693 0.4662 . 

0.0215 0.2489 0.6693 1.0000 0.6915 

0.0573 0.2843 0.4662 0.6915 1.0000 

(a) Find the partial correlation between X A and X 5 , holding X 3 fixed. 

(b) Find the partial correlation between and X 2 , holding X 3> X A% and X 5 

fixed. 

(c) Find the multiple correlation between X x and the set X.^ and X 5 . 

(d) Test the hypothesis at the 1% significance level that arithmetic speed is 
independent of the three memory scores. 






CHAPTER 5 


The Generalized T 2 -Statistic 


5.1. INTRODUCTION 

One of the most important groups of problems in univariate statistics relates 
to the mean of a given distribution when the variance of the distribution is 
unknown. On the basis of a sample one niny wish to decide whether the 
mean is equal to a number specified in advance, or one may wish to give an 
interval within which the mean lies. The statistic usually used in univariate 
statistics is the difference between the mean of the sample x and the 
hypothetical population mean fj. divided by the sample standard deviation s. 
If the distribution sampled is N{ fi, a 2 ), then 

⑴ ! = 

has the well-known ^-distribution with N ~ l degrees of freedo ti, where N is 
the number of observations in the sample. On the basis of this fact, one can 
set up a test of the hypothesis fj. = ^ 0 , where /x 0 is specified, or one can set 
up a confidence interval for the unknown parameter (jl. 

The multivariate analog of the square of t given in (1) is 

( 2 ) r-^Nix-v.yS-'ix-v.), 

where x is the mean vector of a sample of N, and S is the sample covariance 
matrix. It will be shown how this statistic can be used for testing hypotheses 
about the mean vector |x of the population and for obtaining confidence 
regions for the unknown 卜 The distribution of T 2 will be obtained when |x 


different 






.6, optimum properties 
ariance and admissibil- 
l exponential family is 


n elliptically contoured distributions. 

5.2. DERIVATION OF THE GENERALIZED r^STATISTIC 
VND ITS DISTRIBUTION 

5.2.1. Derivation of the r 2 -Statistic As a Function of the Likelihood 


me upservauons are given; h is a function oJt t 

shall not distinguish in notation between the inc 

ters.) The likelihood ratio criterion is 







( 2 ) 


















(Section 3.2) of fjt and X, 


^n= a7 L (^a-x)(x a -xy 


When p, = |x 0 , the likelihood function is maximized at 


,=W E (h — 


by Lemma 3.2.2. Furthermore, by Lemma 3.2.2 


T 叫。， s)= (2.)4 ， ， 


Thus the likelihood ratio criterion is 


lt n l^ 


U+A/(x-^ 0 )(x-^ 0 ) ( M 


(9) A= E (x 0 -x)(x 0 -i)- = (^-l)S. 

a=l 

Application of Corollary A.3.1 of the Appendix shows 

(10) kVN= U + l^(i-Jo1jl^(i-^o)J ， l 

_1_ 

= 1 
=l + T 2 /{M-l) ’ 


(11) r 2 =Mi-^o) ， S' 1 (i-^o) = (^-l)Mi-^o)^" 1 (i-^o)- 








by some arbitrary rule (Lemma A.4.2 of the Appendix). Since Q depends on 
Y*, it is a random matrix. Now let 


( 21 ) 


V = QY\ 

B = QnS*Q f . 


From the way Q was defined, 


u, = Zq u Y* = v/F^, 

U r Zq ji Y* = ^ri.q ii qu = ^ 


Then 



V 1 

b 12 

... b lp 

V 

(23) ^ = l/ , B- 1 l/=(C/ 1 ,0,...,0) 

b 21 

b 22 

… b 2p 

0 



bP 2 

… b pp 

, 0 , 


where By Theorem A.3.3 of the Appendix, l/b n =b n - 

b' w B- 2 l b m ^b n . 2 ..... p , where 


(24) 




and T 2 /n = U^/b n . 2 ^ p = Y* r Y^/b n . 2 ■ p . The conditional distribution of 
B given Q is that of E= =1 K where conditionally the V a = QZ* are 















jquently + AT 2 ) (f ⑴- 

nder the null hypothesis. If we let 


f E (乂 1 LK 1 )-，))， 

\ ar = l 

+ gw 2) - 沪) U 2) -，>))， 










Simultaneous confidence 




























■ 


noncentral T -distnbi 
38) of the probability 


able to bring down the probability of a Type II error. Thus, if we use a 
significance level of 5%, the probability of Type II error (for (f> = 2.5) is only 
0.043. 

Lehmer (1944) has computed tables of 沴 for given significance level and 
given probability of Type II error. Here tables can be used to see what value 
of 丁 2 is needed to make the probability of acceptance of the null hypothesis 
sufficiently low when |x 共 0. For instance, if we want to be able to reject the 
hypothesis |x = 0 on the basis of a sample for a given |x and X, we may be 
able to choose N so that N|x , S"V= t 2 is sufficiently large. Of course, the 
difficulty with these considerations is that we usually do not know exactly the 

values of |x and X (and hence of 丁 2 ) for which we want the probability of 

rejection at a certain value. 

The distribution of 7 2 when the null hypothesis is not true was derived by 
different methods by Hsu (1938) and Bose and Roy (1938). 


5.5. THE TWO-SAMPLE PROBLEM WITH UNEQUAL 
COVARIANCE MATRICES 


If the covariance matrices are not the same, the T 2 -test for equality of mean 
vectors has a probability of rejection under the null hypothesis that depends 












and covariance matrix 


(4) ^x< 2 >-^)(i®-^))- = ^2 2 . 

Thus x (1) -x (2) has mean |x (l) - |x t2) and covariance matrix (1 /A^ 1 )X l + 
(1/N 2 )X 2 . We cannot use the technique of Section 5.2, however, because 

(5) E { x ^- x ^)( x ^- x ^ y + E (# 一无⑵ )(#- 无 ( 2 ))， 

a=\ a=\ 

does not have the Wishart distribution with covariance matrix a multiple of 
(1/^)1+(1/7V 2 )2 2 . 

If N 2 = N, say, we can use the 7 2 -test in an obvious way. Let 
y a = x a } ~ x ? (assuming the numbering of the observations in the two 
samples is independent of the observations themselves). Then y a is normally 
distributed with mean |x (1) - |x (2) and covariance matrix Xx + X 2 , and 
•Vi ， ... ， 3V are independent. Let y = {\/N)Y, N a= l y a = x (1) - x (2 \ and define S 

by 

(6) (N-i)s= JL (y a ~y)(y a -y)' 

a=l 

=E (x^-x^-x^ +x (2) )\ 

Then 

(7) T 2 =J^'S~ l y 

is suitable for testing the hypothesis |x (1 ) — ji/ 2 ) = 0, and has the redistribu¬ 
tion with N-l degrees of freedom. It should be observed that if we had 
known 2! =X 2 , we would have used a 7^statistic with 2N - 2 degrees of 
freedom; thus we have lost N - \ degrees of freedom in constructing a test 
which is independent of the two covariance matrices. If N l =N 2 = : 50 as in 
the example in Section 5.3.4, then r 4 2 49 (.01) = 15.93 as compared to r 4 2 98 (.01) 
=14.52. ’ 

Now let us turn our attention to the case of 爹 N 2 . For convenience, let 
N y <N 2 . Then we define 

⑻” 4 1 ) - ][^ x ^ + 々 _ 7^" ^ x y } ' a= 












































such that 

(6) Pr{Reject H {) \T y oj) < Pr{Reject H 0 \T*,(o}, 

(7) Pr{Reject w} > Pr{Reject H (] \T* y w), wen” 

with strict inequality for at least one w. 

The admissibility of the T 2 -test follows from a theorem of Stein (1956a) 
that applies to any exponential family of distributions. 





ieasure m(A) of a set >1 e ^ is the ordinary Lebesgue 
that maps into the set A. (Note that the probability 
efined by a density.) 


Theorem 5.6.5 (Stein). Let (^/, 成 m，n，P) be c 
d a nonempty proper subset of O. (i) Let A be a sut 



of a better test, that is, 

(10) f<l>(y)dPJy)<f<l> A (y)dPJy), «en 0 , 

( 11 ) f<f>(y)dPjy)>f<p A (y)^(y)， 


1 _ 邏 —111 ii— 
























( 14 ) / [^(y) - dP„ x (y) 

=^T('L a ) / [6( 力 - Hy)]^ dm(y) 

= J[Uy)~ 4>(y)}e^dP^y) 

= [Uy)~ Hy)]e^ y - c) dP4y) 

= H ^ eXc { L y > c [Uy) ~ ❿)] eA( — -c)d 尸 

+ /… JAW ， )]〆(—-〜&( jo}. 

For <a y > c we have 〜(））=1 and (f> A (y) - (f>(y)>0, and {y\(f> A (y)- (f>(y) 
>0} has positive measure; therefore, the first integral in the braces ap¬ 
proaches oo as A -> oo. The second integral is bounded because the integrand 
is bounded by 1, and hence the last expression is positive for sufficiently large 
A. This contradicts (11). ■ 

This proof was given by Stein (1956a). It is a generalization of a theorem 
of Birnhaum (1955). 

Corollary 5.6.2. If the conditions of Theorem 5.6.5 hold except that A is 
not necessarily closed，but the boundary of A has m-measure 0, then the 
conclusion of Theorem 5.6.5 holds. 

Proof. The closure of A is convex (Problem 5.18), and the test with 
acceptance region equal to the closure of- A differs from A by a set of 
probability 0 for all wen. Furthermore, 


(15) 


A n { 3 »| 0 )^ > c} = 0 => A c { 3 »|o)^ <c] 








(26) 


J/(x|a>)n i (<iw) 

- >c 

//(x|w)n 0 (<i W ) 

for some c (0 < c < oo). If equality in (26) occurs with probability 0 for all 
we a 0 , then the Bayes procedure is unique and hence admissible. Since the 
measures are finite, they can be normed to be probability measures. For the 
T 2 -test of // 0 : p = 0 a pair of measures is suggested in Problem 5.15. (This 
pair is not unique.) The reader can verify that with these measures (26) 
reduces to the complement of (20). 

Among invariant tests it was shown that the r 2 -test is uniformly most 
powerful; that is, it is most powerful against every value of among 

invariant tests of the specified significance level. We can ask whether the 
r 2 -test is “best” against a specified value of … 2 一 1 |x among all tests. Here 
“best” can be taken to mean admissible minimax; and “minimax” means 
maximizing with respect to procedures the minimum with respect to parame- 
ter values of the power. This property was shown in the simplest case of 
/? = 2 and N = 3 by Giri, Kiefer, and Stein (1963). The property for general p 
and N was announced by Salaevsku (1968). He has furnished a proof for the 
case of p = 2 [SalaevskiT (1971)], but has not given a proof for p>2. 

Giri and Kiefer (1964) have proved the 7 2 -test is locally minimax (as 
V — 0) and asymptotically (logarithmically) minimax as ixl —V — oo. 

5.7. ELLIFOCALLY CONTOURED DISTRIBUTIONS 
5.7.1. Observations Elliptically Contoured 

When j ：!，...,constitute a sample of N from 

(1) |Ar^[(x-v)-A-'(x-v)], 

the sample mean x and covariance 5 are unbiased estimators of the distribu¬ 
tion mean = v and covariance matrix X = (<^R 2 /p)A, where R 2 -= 
(X-v) , X~ 1 (X~v) has finite expectation. The T^statistic, T 2 ^N(x- 
1x)3—Hi - |x), can be used for tests and confidence regions for |x when X 
(or A) is unknown, but the small-sample distribution of T 2 in general is 
difficult to obtain. However, the limiting distribution of T 2 when N — oo is 
obtained from the facts that ^(x - |x) 4 iV(0, 2) and 5 Ax (Theorem 
3.6.2). 
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5 . 10 . (Sec. 5.2.2) From Problems 5.5-5.9, verify Corollary 5.2.1. 

5 . 11 . (Sec. 5.3) Use the data in Section 3.2 to test the hypothesis that neither drug 
has a sonorific effect at sienificance level 0.01. 


5.15. k5>ec. d.o.zj 1 -test as a oayes procedure Lisjeier anu ocnwaru j-ci 

x v ...,x N be independently distributed, each according to N(n ， l). Let II 0 be 
defined by [p>,S] = [0，（J + with ” having a density proportional to 

iJ + iriVl — 士 ' and let II■ be defined by [|jl ， S] = [U + 训 ’) -1 ”，“ +1 V) _1 ] 
with ” having a density proportional to 

l/ + TnV「0expb/'/V(/ + iriV) _1 inL 

(a) Show that the measures are finite for N>p by showing V(J + ； nV 广 、 < 1 
and verifying that the integral of |/ + TfiVl _ 士 " = (1 + tthO - is finite. 

(b) Show that the inequality (26) is equivalent to Nx'(L^iX a x^Y^x^k. 


next 
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uppose 










N'<N g ，g = 2, … ， q; and ( 叶)，…， 々 ), i = 1 ， … ，分 一 1 ， are linearly inde- 



(b) Show how to construct a 7 2 -test of the hypothesis using ( 歹⑴， ^ (q " I) 0 , 
yielding an F-statistic with {q - \)p and N -{q — Dp degrees of freedom 
[Anderson (1963b)]. 


5.27. (Sec. 5.2) Prove (25) is the density of V = xlAxl ^ xl)- iHint: In the joint 
density of U = xl and W = xl make the transformation u = uw(l - w = w 
and integrate out w.] 


CHAPTER 6 

Classification of Observations 


6.1. THE PROBLEM OF CLASSIFICATION 

The problem of classification arises when an investigator makes a number of 


國 



measurements. The investigator 


each population is characterized by a probability distribution 
merits. Thus an individual is considered as a random obsen 
population. The question is: Given an individual with certain 
from which population did the person arise? 


By 


this 























































optimum procedure. For a general discussion of the concepts in this section 
and the next see Wald (1950)，Blackwell and Girshick (1954)，Ferguson 
(1967)，DeGroot (1970)，and Berger (1980b). 


6.3. PROCEDURES OF CLASSIFICATION INTO ONE OF TWO 
POPULATIONS WITH KNOWN PROBABILITY DISTRIBUTIONS 

6.3.1. The Case When A Priori Probabilities Are Known 

We now turn to the problem of choosing regions and R 2 so as to mini¬ 

mize (5) of Section 6.2. Since we have a priori probabilities, we can define joint 
probabilities of the population and the observed set of variables. The prob¬ 

ability that an observation comes from tt, and that earh variate is 1 as.q than 






component in y is 

… / 二 dx x - a 
the conditional probability that 


oming from population tt 1} given an observa- 
<?iPi ⑺ 

^i(x)+q 2 p 2 (x)' 

112) = C(2| 1) = 1. Then the expected loss is 




























































The left-hand side of C 

(4) -\[x'l.^x-x'\ 

By rearrangement of th 

(5) x'S-V 
The first term is the w( 
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CLASSIFICATION OF OBSERVATIONS 


For the minimax solution we choose c so that 

Theorem 6.4.2. If the tt,- have densities (l), / = 1,2, the, minimax regions of 
classification are giuenby ⑹ where c = \ogk is chosen by the condition (16) with 
C{i\j) the two costs of misclassification. 

It should be noted that if the costs of misclassification are equal, c = 0 and 
the probability of misclassification is 

⑼ c 士士办. 



zero to obtain 


(24) 2[(/) - ^ 2 >)(^ - ft< 2 >) ( ]d = 2A2d. 

Since (p. (1) - p. (2 ^rf is a scalar, say v, we can write (24) as 

(25) M .0)_ M .(2) = A Sd . 

Thus the solution is proportional to 8. 

We may finally note that if we have a sample of N from either % or 7 t 2 , 
we use the mean of the sample and classify it as from N[p. (1) ,(l/N)S] or 

N[^ 2 \a/Nm 

6.5. CLASSIFICATION INTO ONE OF TWO MULTIVARIATE NORMAL 
POPULATIONS WHEN THE PARAMETERS ARE ESTIMATED 

6.5.1. The Criterion of Classification 


Thus far we have assumed that the two populations are known exactly. In 
most applications of this theory the populations are not known, but must be 



On the basis of this information we wish to classify the observation x as 
coming from tt 1 to 7 t 2 . Clearly, our best estimate of ^ 1 ) is 无 ⑴ = 
of JJL( 2 ) is x (2) = and of S is S defined by 


⑴ [ N l + W ) S = £ (4 )-i ⑴ )(#-，)， 

a= 1 

+ E (4 2 )-i( 2) )(# - i( 2) )，. 

a=\ 
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where 

2 

(24) A= E E 

i = l a=l 

Since (x^-x^Yb is a scalar, we see that the solution b of (23) is propor¬ 
tional to S—Kj ⑴一 x( 2 )). 


6.5.5. The Likelihood Ratio Criterion 

Another criterion which can be used in classification is the likelihood ratio 
criterion. Consider testing the composite null hypothesis that x, x ^ } ,..., 
are drawn from N(p (1 ) ， 2) and are drawn from MfJt <2) ,2) 

against the composite alternative hypothesis that x\ l \ ■.. ， xjj, are drrwn from 
Mn- (l> ,2) and x ， x 、 2 ) ， ... ， x 泛 are drawn from with fx (l) ，and 

£ unspecified. Under the first hypothesis the maximum likelihood estimators 
of \jl°\ \jl {2 \ and S are 

V^=x m , 

^ = t ^ 2 + 1 [ 至 + {x-^){ x -^y 

+ E (4 2) -ii ( i 2) )(^ 2, -(i ( , 2) )| 

Since 

(26) EK°- ^))(4” - + ( 卜的 )) ( 卜的 n )， 

a=l 

N' 

=E - fLV J )(^ (1) - 

a=l 





The classification problem is invariant with resp 


(34) 






Theorem 6.6.1. ^ 

A^i+iV 2 -2), 



= <t(M) _ < 


+ 2A^ 

+ 去… 

and Vt{-{W+\^), 



























(1973,1974a ， 1974b ， l 1 
it is 

(14) $(-4D) + 






pruperueb oi inese ana otner esumators, as uiu Lacnenorucn ana lviicKey 
(1968). 

Now consider (12) with c = Du { + {D 2 \ u x might be chosen to control 








a preassigned 5 with a preassigned confi¬ 


dence level 1 — 

6.6.2. Asymptotic Expansions of the Probabilities of Misclassification 
Using Z 

We now turn our attention to Z defined by (32) of Section 6.5. The results 
are parallel to those for fV. Memon and Okamoto (1971) expanded the 
distribution of Z to terms of order n~ 2 } and Siotani and Wang (1975), (1977) 
to terms of order n~ 3 . 

Theorem 6.6.5. As N l -> oo, N 2 -> oo, and Mi/N 2 approaches a positive 
limit ， 

(16) Pr{^^< M |^} 

= 4>(m) -^(m)|^^j[ m 3 + Am 2 + ( /j -3) M -A] 

+ 2A^[ m3 + Am 2 + O - 3 - A 2 )u _ A 3 _ 厶 ] 

+ 士 [ 4m3 + 倫 2 + (6/j-6 + A 2 )m + 2(p-1)A]J + 0{n~ 2 ), 

aru ^ Pr{ -(Z + |A 2 )/A < m| 7t 2 } is (16) with and N 2 interchanged. 

When c = 0, then w = - ^A. If N x = N 2 , the rule with Z is identical to the 
rule with W, and the probability of misclassification is given by (2). 

Fujikoshi and Kanazawa (1976) proved 

Theorem 6.6.6 

( 17 ) Pr{^^<^,} 

= <!>(«) +Am-( p- 1)] 

-I7^[ m2 + 2 Am+ P - 1 + a2 1 

+ ^[ M3 + (4 p -3) M ]}+0( ra _ 2 )， 
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6.7. CLASSIFICATION INTO ONE OF SEVERAL POPULATIONS 

Let us now consider the problem of classifying an observation into one of 
several populations. We shall extend the consideration of the previous 
sections to the cases of more than two populations. Let be m 

populations with density functions p^x\...,p m UX respectively. We w.sh to 
divide the space of observations into m mutually exclusive and exhaustive 
regions R v ...,R„ r If an obse.vation falls into R i： we shall say that it comes 
介 om it,.. Let the cost of misclassifying an observation from tt,‘ as coming trom 
tt ，be C(./|i). The probability of this misclassification is 

(1) P{j\i,R)= \ R PAx)dx. 





is better if at least one inequality is strict. R is admissible if there is no 
procedure R* that is better. A class of procedures is complete if for every 
procedure R outside the class there is a procedure R* in the class that is 
better. 

Now let us show that a Bayes procedure is admissible. Let 尺 be a Bayes 











ti Bayes procedure is unique (for the specified probabilities). 
【 procedure is the Bayes procedure for which the risks are 


There are available general treatments ot statistical aecision proceuuic 
by Wald (1950)，Blackwell and Girshick (1954)，Ferguson (1967)，De Groot 
(1970)，Berger (1980b), and others. 

6.8. CLASSIFICATION INTO ONE OF SEVERAL MULTIVARIATE 
NORMAL POPULATIONS 

We shall now apply the theory of Section 6.7 to the case in which each 
population has a normal distribution. [See von Mises (1945).] We assume that 
the means are different and the covariance matrices are alike. Let Mix' 
be the distribution of it,. The density is given by ⑴ of Section 6.4. At the 
outset the parameters are assumed known. For general costs with known 




The constants c k can be taken nonnegative. These sets of regions form the 
class of admissible procedures. For the minimax procedure these constants 
are determined so all R) are equal. 

We now show how to evaluate the probabilities of correct classification. It 
X is a random observation, we consider the random variables 

(4) v jt = [x- + - ^ (0 )* 

Here V jt = - U ir Thus we use m{m - 1)/2 classification functions if the 
means span an [m - l)-dimensional hyperplane. If X is from then U }i is 
distributed according to Ay,), where 

(5) A^( 以 — 

The covariance of U jt and U jk is 

( 6 ) _，)). 

To determine the constants c ; we consider the integrals 

(7) PU\j-R) = f ... fjduydUj.jMdUj'jydu ^， 

where f t is the density of U jh i = 1 , 2 ,..., w, i*j. 

Theorem 6.8.2. If ^ is and the costs of misdassification are 

equal, then the regions of classification, that minimize the maximum 

... , > f__ xAtltat-/? u . ( l-'i Qinpn hv (1). The 




between 


nearer 


To do this in 
components of t 
sider the three 









Table 6.2 


Measurement 


Mean 


Brahmin 

(1) 

Artisan 

(订 2) 

Korwa 

( 苁 3) 

Stature (x,) 

164.51 

160.53 

158.17 

Sitting height (x 2 ) 

86.43 

81.47 

81.16 

Nasal depth (^ 3 ) 

25.49 

23.84 

21.44 

Nasal height (jc 4 ) 

51.24 

48.62 

46.72 


6.9. AN EXAMPLE OF CLASSIFICATION INTO ONE OF SEVERAL 
MULTIVARIATE NORMAL POPULATIONS 

Rao (1948a) considers three populations consisting of the Brahmin caste 
(77 】）， the Artisan caste (7 t 2 )， and the Korwa caste (7r 3 ) of India. The 
measurements for each individual of a caste are stature sitting height 
nasal depth U 3 )，and nasal height (x 4 ). The means of these variables in 

the three DODulations are eiven in Table 6.2. The matrix of correlations for 


l — 




: 3 )) = I-Yjx ⑴一 〆 3)) 一 一 /2)). Then we calcu- 

一 H〆 0 — fjt 0) ). We obtain the discriminant functions 1 " 
.0708^：! + 0.4990^: 2 + 0.3373^ + 0.0887% — 43.13, 
.0003^：! + 0.3550^: 2 + 1.1063^ + 0.1375x 4 - 62.49, 
•0711X】- 0.1440x 2 + 0.7690^3 + 0.0488% - 19.36. 

iputations, Rao’s discriminant functions are incorrect. I am indebted to 
stance in the computations. 

































(6) 遂 [b ， x- *V，)) 2 | % = Sb\x --〆，)) 叫％ = b^b. 

The probability of misdassifying an observation when it comes from the first 
population is 


⑺ 


尸 (2|1) 


奉咖叫笑 








The probability of misdassifying an observation when it comes from the 
second population is 


( 8 ) 


尸 （ 1|2) = Pr{ft'x 之 c| 7r 2 } = Pr< 


b'x - 

(b'W ~ 


c-b'y^ \ 

(6'S 2 *) 叫 




It is desired to make these probabilities small or, equivalently, to make the 
arguments 


(9) 


办 V 1 ) -c _ c-b’pp 

^ = {b'^ 2 bf 


large. We shall consider making large for given y 2 . 
When we eliminate c from (9), we obtain 


( 10 ) 


y x = [b'y-y 2 {b'^ 1 by\{b'l. i b) 


； 12) V 

〔 13) h 

then (11) set equal to 0 is 

(14) ( 

Note that (13) and (14) imp 
satisfying (12) and (13), then 

(15) c=y 2} fb^ 
Then from (9), (12)，and (13: 

(16) 少 ，—— 

Now consider (14) as a ft 
then b = +^2 2 ) _1 7. I 

derivative of with respect 

(17) i 2 7 '[r2,+(l-0: 
= 2ty'[t^ x + (1 - 0^2 

- ， 2 7H + (l-0 

= ty r [a^o-t)^ 2 ] 
+ ^1 


by the following lemma. 














remind the reader that the curve of admissible error probabilities is not 
necessary convex.) 

Anderson and Bahadur (1962) studied these linear procedures in general, 
including y x <0 and y 2 < 0. CIunies-Ross and Riffenburgh (1960) ap¬ 
proached the problem from a more geometric point of view. 


ant under the transforma- 


PROBLEMS 








The Distribution of the Sample 
Covariance Matrix and the 
Sample Generalized Variance 


7.1. INTRODUCTION 


The sample covariance matrix, S = [l/(N - l)]L n a=1 (x a - x)(x a - x)\ is 










DISTRIBUTION 

ie distribution of A = L^ =1 (X a -XXX a -Xy t where 
independent, each with the distribution Mil. TV As 








































7.3. SOME PROPERTIES OF THE WISHART DISTRIBUTION 
7.3.1. The Characteristic Function 

The characteristic function of the Wishart distribution can be obtained 
directly from the distribution of the observations. Suppose Z! ， ... ， Z n are 
distributed indeDendentlv, each with density 




^ real, there is a real nonsmgular matrix B 

B0B=D, 

(Theorem A.I..2 of the Appendix). If we let 

l) ^^Qxp(iY'DY) 

=#nexp(W".l} 2 ) 

=r^exp(id"l} 2 ) 












Distribntions 











Conditional Distributions 


tion 4.3 we considered estimation of the parameters of the conditional 
ution of A ： ⑴ given A： 121 =x (：) . Application of Theorem 7.2.2 to Theo- 






vacuous 



⑵. 







independently according to Wi^^N- 1). 

Anderson and Styan (1982) have given a survey of proofs and extensions of 
Cochran’s theorem. 


7.5. THE GENERALIZED VARIANCE 


7.5.1. Definition of the Generalized Variance 


One multivariate analog of the variance a 2 
covariance matrix 2. Another multivariate 



called the generalized variance of the multivariate distribution [Wilks (1932 )， 
see also Frisch (1929)]. Similirly, the generalized variance of the sample of 
vectors x v ,..,x N is 

(l) isl E 

In some sense each of these is a measure of spread. We consider them here 
because the sample generalized variance will recur in many likelihood ratio 
criteria for testing hypotheses. 












volume of the parallelotope 


interpretation of \A\ 
ie matrix (T) he v,. 


see 


all parallelotopes form 
•et 力，…，％. 

… En-i， a 


ILyj-i 



… Edv 

… E^-i 

a y ia … Ew 







w “ durn^. mtn is propomonai to me sum oj squares of the 

volumes of all the different parallelotopes formed by using as principal edges p 

vectors with p of x v … ， x N as one set of endpoints and x as the other, and the 
factor of proportionality is 1 /(N - l) p . 

The population analog of |5| is |X|, which can also be given a geometric 
interpretation. From Section 3.3 we know that 

( 10 ) ?r{x r ^- ] X<x^(a)}-=l-a 

if X is distributed according to MO, X); that is, the probability is 1 - a that 
X fall inside the ellipsoid 

(11) : = V ⑷. 

The volume of this ellipsoid is C(p)|XM[ XpM^/p, where C(p) is defined 
in Problem 7.3. 


7.5.2. Distribution of the Sample Generalized Variance 












0 and 


2p. 


DISTRIBUTION OF THE SET OF CORRELATION COEFFICIENTS 
HEN THE POPULATION COVARIANCE MATRIX IS DIAGONAL, 


Section 4.2.1 we found the distribution of a single sample correlation when 
e corresponding population correlation was zero. Here we shall find the 
jnsity of the set r"，/ <j, i , 卜 1 ， … ， p, when p tj = 0, i <j. 

We start with the distribution of A when S is diagonal; that is, 
WKa^SijXnl The density of A is 


⑴ 


1^1 i( "~ f， ~ 1) exp (- 

2 沁 叩旧 I； (沁) ’ 


( 2 ) 


121 


o-n 0 

0 a 22 


n( 

i=i 


We make the transformation 


(3) 


，•句 _， 

⑷ 

a ii = a ii- 



matrix with diagonal elements Since each particular subscript k t 

say, appears 1 in the set (i <j) p -1 times, the Jacobian is 


(5) 




If we substitute from ⑶ and (4) into n] and multiply by (5), we 

obtain as the joint density of {〜} and {/■"} 


( 6 ) 


I /— r— 

|V^Wv| 




2^Uoj^ P (kn) 








expC-ifl.-./o-,,) 1 


( 7 ) \yf^i 

where r u = 1. In the ith term of the product on the right-hand side of (6), let 
a ii /(2<r ii ) = u i \ then the integral of this term is 

















K^N^T p [{(N + m)} 

(U) ^ P [i(N-l)]V p (\m){N + K)^ 

.| /1 |i(N-p-2)|q,|>|q, + ^ + _^^( ；E _ V )(i-v) < r^ +m) . 

The conditional density of jt and S given x and A is the ratio of (8) to (11)， 
namely, 

(^ + A ： )' ； p |5 ； r KA,+m+p+2) l^+^ + 7 ^^(i-v)(i-v) , |^ +,,,) 
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COVARIANCE MATRIX DISTRIBUTION ； GENERALIZED VARIANCE 


The estimator of X is a weighted average of the sample covariances S, 
and a term deriving from the difference between the sample mean and the a 
priori mean. If N is large, the estimator is dose to the sample covariance 
matrix. 


Theorem 7.7.4. Ifx u . ,.,x N are observations from Mft, 2) and if pi and 
^ ha ^theaprion density n^lv, (1//OS] then the marginal 

a postmon density of ji given x and S is 

(15) 

_ + + m + _ 

+ m + l -p)][l + (^ + ^)(^1- 

where |a* is (13) and B is N + m —p — 1 times (14). 

^oof. The exponent in (12) is - j times 

( 16 ) tr[fi + (N + 0)0 —W]S _1 . 

Then the integral of (12) with respect to S is 


(17) . (^ + ii ： )^r p [|(^ + m + l)]|g|^ m )_ 

7rtp r；[|(^ + m)]|B + + yyi 灿 + m + i) . 

Since = |B|(1 +x'B~ l x) (Corollary A.3.1), (15) follows. 

The density (15) is the multivariate ，- distrit 
degrees of freedom. See Section 2.7.5, Examples. 


7-8- IMPROVED ESTIMATION OF THE COVARTA 




likelihood function: 

(2) L^S.G) =trG2- 1 -loglGS-'l -p. 

(See Lemma 3.2.2 and alternative proofs in Problems 3.4, 3.8, and 3.12.) Each 
of these is 0 when G = 2 and is positive when G^l. The second loss 
function approaches oo as G approaches a singular matrix or when one or 
more elements (or one or more characteristic roots) of G approaches os. (See 


Proof. By the invariance of the loss f 
(3) ^L q (l,aA) = SjL q {I,aA* 




= ^L 2 e a ) 

\ «,y=i 

= fl 2 [(2n + AT 2 ) 
=p[n(n + p 











VRIANCE MATRIX DISTRIBUTION ； GENERALIZED VARIANCE 

at a = l/(» +/> + 1). Similarly 
iA) = S I L l {I,aA^) 

=c^j{atvA* - iog\A*\ - p log fl -p] 

=p[na - log a - l] - Si logl/l *| ， 

7= 1/At. ■ 





/,/th component of (5) is 

( 6 ) 




■ 


where D is a diagonal matrix not depending on a. we noic in pa_g 隊 u 
(5) holds for all nonsingular H, then D = al for some a, (H can be taken as 
a permutation matrix.) 

If 1 where K is lower triangular, then 

⑻ 

^L[X,G(/l)]= /L[2,G(/()]C(p,n)|Xr^|/t|^- p - 1) e -^ S " U ^ 

^ ,G(A)\C{p,n)\KK'\-' in \A\'^- l) 


，- wn ' 以 


(9) i I L q [l i G(A)]=£ I L q [I,TD ： 
= Atr(7I>r 
=^ tr(TDT 7 


= ^ / H hj 

i ， j ， k ， l=l 


The expectations can be evaluated 
elements of T are independent, tf t 


degrees of freedom, and / 1； , i > y, 


( 10 ) 


S,L q [l,G{A)]=d 















.118 for n = 3, 0.065 for n = 4, etc. The risk (19) is 
roblem 7.31.) 

;e of these estimators is that they depend on the 
ne the /th nermutation matrix- i — 1. n\. and \et 






aller likelihood risk than S. 

.ELLIPTICALLY CONTOURED DISTRIBUTIONS 
.1. Observations Elliptically Contoured 

nsider x v ...,x w observations on a random vector X with density 

I |Ar^[(x-v) , A-'(x-v )] ： 

t .4 = E^ 1 (x„=iXx„-x) , I n = N-\, S = (\/n)A. Then as 

— oo. The limiting normal distribution of -/N vec(5 — X) was given in 
eorem 3.6.2. 


7.9 ELLIPTICALLY CONTOURED DISTRIBUTIONS 


283 


S^I and f h ， y[N{S ~I) and v^V(f-/) have limiting normal distribu¬ 
tions, and 


( 2 ) 微、 S - I、 = - 1、 + ■ (j - [、’ + o 人 V). 


That is, ]/N(s n - 1) = 2 ^(?;,. - 1) + O p (l), and /N= 你、 + O p (X\ i > j. 
When 1=1, the set }/N(s n -l),...,y/N(,s pp - 1) and the set /N SiJ , i >j, 
f re asymptotically independent; }/Ns u ,... p are mutually asymptot- 



this further here. 

7.9.2. Elliptically Contoured Matrix Distributions 

Let X (NXp) have the density 

(4) iCr^gfc-'C^-e^vOX^-^v^CC-)' 1 ] 

based on the left spherical density g(KT). 

Theorem 7.9.2. Define T = U u ) by Y Y = TT l , t {j = 0, / <and t r[ > 0. If 
the density of Y is g(Y f YX then the density of T is 




























CHAPTER 8 


Testing the General Linear 
Hypothesis; Multivariate 
Analysis of Variance 


8.1. INTRODUCTION 

In this chapter we generalize the univariate least squares theory (i.e. ， regres- 
sion analysis) and the analysis of variance to vector variates. The algebra of 

the multivariate case is essentially the same as that of the univariate case. 

This leads to distribution theory that is analogous to that of the univariate 

case and to test criteria that are analogs of ^-statistics. In fact, given a 
univariate test, we shall be able to write down immediately a corresponding 
multivariate test. Since the analysis of variance based on the model of fixed 

effects can be obtained from least squares theory, we obtain directly a theory 

of multivariate analysis of variance. However, in the multivariate case there is 
more latitude in the choice of tests of significance. 

In univariate least squares we consider scalar dependent variates x u …, x N 
drawn from populations with expected values p respectively, 

where P is a column vector of q components and each of the z„ is a column 
vector of q known components. Under the assumption that the variances in 
the populations are the same, the least squares estimator of is 

⑴ 叫£—(£“ 

\ «=1 / \a = l 
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: (X ia -iX ia )z a E (X Jy -^X jy )z' y A~ l 

=1 7=1 

: - <^X ia )(X jy - ^X jy )z a z' y A-' 

=1 

: S^a-^z^'yA-' 

=1 

L4 … 

pq components pp) f = vec 自 'is nor- 

& ， ... ， pp , = vecp , and covariance matrix 

^n A ~ X a \ P A 

(t 22 A~ 1 … (i 2p A 

^ 2^ _1 … (T pp A 

( 八 r nrfwliu't <、f thn mntrirp« ^ and 



The density then can be written [by virtue of (4)] 

( 17) 自 — 叫 (自 _PV + Ni])) ' 

This proves the following: 

Corollary 8.2.1 . 自 and % form a sufficient set of statistics for P and 1. 

A useful theorem is the following. 

Theorem 8.2.3. Let X n be distributed according to MPz„, S), a = 1. N, 

and suppose X' ， … ， X N are independent. 

(a) If w ^Hz a and I - p// 1 , then X a is distributed according to 
N(Tw a ^). 

(b) The maximum likelihood estimator of V based on observationsxon X n , 

where P is the maximum likelihood estima¬ 
tor of p. 

(^\ r(V iv 1*/ 、卜 =0>48' where A = and the rna:imum likeli- 
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Theorem 8.2.4. The least squares estimator is the best linear unbiased 
estimator of P ig . 

Proof. Let be an arbitrary unbiased estimator of 

ft s , and let 4 = be the least squares estimator, where 

A = E^z〆。. Then 

( 20 ) 

=C^( p ig - Pig) + 2 /( p ig - Pig - Pig) + <^{ Pig - Pig ) - 

Because and fi ig are unbiased, - P lg = E^ = i Ef = i f ja u Ja , $ ig - /B ig = 
^LU^ ia z ha a h ^ and 

(21) P if , - Pig= E E [fji~ 8 ij L Z ha ahS \ U ja^ 

a=l ；=1 \ fi = \ 1 

where 5,, = 1 and S,) = 0, i • 句 ..Then 

(22) —ft s )(4 — 4) 

=C^ E E z^u^ilfj.-S, i z h . y a h Au jy 

a,y=l h=\ ) = 1 \ h，= X j 

N q P I 9 \ 

=E E Eh ， / /0 -s,. y E 卜 . 

a = l h = \ 7*1 \ h' = \ I 

=(T-,i a8S ~ a ii ^ 51 a hh ，ah8ah S 

h = \ h' = \ 

= 0 . 

Then (20) implies A 心一 ft,) 2 > £{ ft, - ft,) 2 - ■ 

83. LIKELIHOOD RATIO CRITERIA FOR TESTING UNEAR 
HYPOTHESES ABOUT REGRESSION COEFFICIENTS 


8.3.1., Likelihood Ratio Criteria 
Suppose we partition 

(l) p = (Pi PO 













From the orthogonality relations we have 

(19) (x-pz)(x-pzy 

=(X- 自 n Z)(A"- PaZ)' + (P 2tl) - P2) Z 2 Z b ( 自 2w _ P 2 )’ 

+ (P,a- PT)(Zi -A n A^Z 2 )'{^ n -f^)' 

=+ (L - P 2 )/l 22 d - P 2 )， 

+ (Pui _ PT)( /1 11 ~ A n A 22 A 2i){Pm ~ Pi ) f - 

If we subtract (p 2w - P 2 )Z 2 from both sides of (18), we have 

(20) -X-P^Z,- P 2 „Z 2 = (X- P„Z) + (P H1 - Pt)(Z! -A n A~^Z 2 ). 

From this we obtain 

(21) (；t - P% - P 2iu z 2 )(^-Ptz,- P 2w z 2 y 
= (X-p n Z)(X-p n Z)' 

+ d-Pt)(z 1 -/i 12 /i 2 - 2 1 z 2 )(2: 1 - /1 12 /1 2 - 2 %) ， ( 自 in -PT)'. 
=/vi n + (p ln - Pt )(/lii -~ Pi )'• 

The determinant |i n | ={l/N p )\{X-^ a Z).X-^ n Z)'\ is proportional 
to the volume squared of the parallelotope spanned by^the row vectors of 
A ： - p fl Z (translated to the origin). The determinant lij ={\/NP)\{X~ 
P*Z l - p 2w Z 2 XA：- - P 2QJ Z 2 y| is proportional to the volume squared 
of the parallelotope spanned by the row vectors of .Y - - 自 2w Z 2 (trans- 


X- PtL that is orthogonal to Z 2 . Thus the test based on the likelihood ratio 
criterion depends on the ratio of volumes of parallelotopes. One parallelo¬ 

tope involves vectors orthogonal to Z, and the other involves vectors orthogo¬ 
nal to Z : . 

From (15) we see that the density of x ]t ...,x N can be written as 

(22) P 2 )， 

+ (Pl^ _ P! )(^11 -^ 12 ^ 22 ^ 2 l)(Pin ~ Pi ) 1 })* 
Thus, i, p in , and P 2tl form a sufficient set of statistics for 2, P” and p 2 . 


COEFFICIENTS 
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^ 1 (^ 1 ~ ^ 12 ^ 22 ^ 2 ) 

p 2 z 2 

fi 3 


where g 3 is any nxN matrix making Q orthogonal. Then the columns of 

(26) W^(W, W 2 W 3 )^ X Q f =X(Q\ ff 2 & 3 ) 

are independently normally distributed with covariance matrix 2 (Theorem 
3.3.1). Then 


Wilks (1932) first gave the likelihood ratio criterion for testing the equality 
of mean vectors from several populations (Section 8.8). Wilks (1934) and 
Bartlett (1934) extended its use to regression coefficients. 

8.3.3. The Canonical Form 

In studying the distributions of criteria it will be convenient to put the 
distribution of the observations in canonical form. This amounts to picking a 
coordinate system in the TV-dimensional space so that the first q x coordinate 

axes are in the space of Z that is orthogonal to Z 2 , the next q 2 coordinate 

axes are in the space of Z 2 , and the last n (=N-q) coordinate axes are 
orthogonal to the Z-space. 

Let P 2 be a q 2 xq 2 matrix such that 

(23) 7 = /> 2 4 22 ^ = (/> 2 Z 2 )(P 2 Z 2 )\ 
and let P y be a q ] Xq ] matrix such that (A u . 2 =A n 

(24) I = P l A n . 2 P[ = [P 1 (Z,-A n A^Z 2 )][P i (Z l -A, 2 A^Z 2 )]'. 

Then define the NxN orthogonal matrix Q as 


' g1 ' g2lfi3 




^z 2 y 





















3Q6 testing the general linear HYPOTHESIS ； MANOVA 

and G i and H { are the submatrices of G and H, respectively, of the first i 
rows and columns. Correspondingly, let consist of the first / components 
0 f y = Xq - p^ 1 )’ We shall show that is the length squared 

of the vector from : y,* = ( 如 … ’ 如 ） to its projection on Z and = 
divided by the length squared of the vector from 兄 to its 
projection on Z 2 and 

Lemma 8.4.3. Let y be an N-component row vector and U an rXN matrix. 
Then the sum of squares of the residuals of y from its regression on U is 

卜’叫 

\Uy r UU'\ 

W \UU'\ 


⑽ , : 







m*=p, and «-p-l replaced by n* - p* - I 
that U p . m ,„ given by (14) is \I m -U^U'J =C / m , ； 


Theorem 8.4.2. When the hypothesis is true, t 
is the same as that of U qi< p ' N - p - qi (i.e., that of 


Since (ll)'is a density and hence integrates to i, Dy cn 




From this fact we see that the ^th moment of is 


,ri T\Hn + m + l-i)\ . + n - dv 

( 2o ) ^ = LmuiTT^m>) ( 

= - r[K» + i- i )ir[i(« + w + 1 - / ) + "l ’ 


K j, . • • j ^ p * - 1 - , 

the following theorem: 

Theorem 8.4.3. The hth moment of U[if h > - i(n + 1 - p)} is 


(21) su h = Y\ 


P T\\(n + l-j)+h]r[{(n+m + l-i)} 
Un^n + l-i^Win + m + l-^+h} 

二 rf|(/V-<?, -? 2 + 1 -0 +h}T[\(N-q 2 - 
,U r[^(N-q,-q 2 + l ~ 0]r[K N-q 2 + l- 


In the first expression p can be replaced by m, m by p, and n by 

n + m-p. q t _. .. . r_ - 

Suppose 」 







Then the Mh moment of V 2r m n is 
(23) 

", A I r[\(m+n + 2) -)] r[^{m + n + l)-j] 

2r - m " ~ \r[i(m+n + 2 )-j + h] r[i(w +« + l) -j + h\ 

r[\_{n + 2)-j+h]r[\_(n + l)-j+h ]) 

r[K» + 2 )-/]r[i(« + i)-7] j 

= 'j T{m + n + 1 - 2j)T(n + 1 - 2j + 2h) \ 

^ }Ji\T{m + n~l-2j + 2h)r(n + l- 2j) j - 

It is clear from the definition of the beta function that (23) is 







the /= 1 ， ...，■?，are independent and Z t has density 
and Z J + i is independently distributed with density )3[z;^(n 


From t 


Distributions 

characterization we see that the density of U l m ’ n is 


(26) 


Another way of writing U l m n is 


where g n is the one element of G =Nl tn and F m n is an F-statistic. Thus 
/oo\ 1 _ n r ， 

(28) -^Zr-m =F — 

Theorem 8.4.5. The distribution of [(l ~ m n )/U l m n ]'n/m is the 
F-distribution with m and n degrees of freedom; the distribution of 
[(1 — U pt] n /U Pil n ]'(n + 1 —p)/p is the F-distribution with p and n + 1 —p 
degrees of freedom. 


P = 2 












.This procedure, of course, is not invariant with respect to linear 
formation of the dependent vector variable. However, before carrying 
step-down procedure, a linear transformation can be used to determine 
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The factors can also be used to obtain confidence regions for P!]，. 
Let u^Gi) be defined by 


. ， Ppi. 


(44) 




咖 i) 


• ^m,n-i + \ ( ^/) * 


Then a confidence region for p n of confidence 1 — 巧 is 


(45) 
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8.5. AN ASYMPTOTIC EXPANSION OF THE DISTRIBUTION 
OF THE LIKELIHOOD RATIO CRITERION 


8.5.1. General Theory of Asymptotic Expansions 




where x k = \N=y^ & = K - 七 + 1 ― ))' fl = = 户 . We 

treat a more general case here because applications later in this book require 
it. 

If we let 

(3) M= -2\ogW, 
the characteristic function of pM (0 <p < 1) is 

(4) ^{t)^Se itpM 

= SW~ litp 

(n^yp V 2i ' P UUJ\x k {i-2itp) + ^} 

u^[y^- 2lt p) + M . 

Here p is arbitrary; later it will depend on N.lf a = b, x k =» 6 < 爪 ’ then 
(1) is the /!th moment of the product of powers of variables with beta 
distributions, and then (1) holds for all h for which the gamma functions 
exist. In this case ⑷ is valid for all real t. We shall assume here that ⑷ holds 
for all reaU and in each case where we apply the result we shall verify this 
assumption. 

Let 

(5) 0(0 =\og(f>{t) =g(f) -g(0), 

where 

g(t) = 2itp[ L^iogyJ 

\k=\ 尸 1 I 

+ L l0 s r [ paU — 2 ") + 氏 + 么 ] 

1 k=i 

-L log r[ pyj{\-2it) +£j+ Tjy], 

where 氏 =(1 - P )〜and 〜 =(1 - p)y^ The form g(r) — g ⑼ makes 0(0)= 
_ . . • 」 c i. ^* i/" .'o tv»ot ，/ = i w p mfllcp "se of an 











B^h) =/i - i ， 

B 2 (h 卜 h 2 - h + l 

B } {h) = h 2 — \h 2 + \h. 


Taking x = px k {l - 2it), p^(l -2it) and h = ^ k + e y + Vj >n turn, we 
obtain 

(9) ^{t)=Q-g(Q)~\f log(l-2(f) 

m a b 

+ i^(i-2/o _r + E o(xr m+1) ) + Eo(^ (m+1) ), 

r= 1 k=l ； = 1 

where 

(10) /=-2(pr !>/-!(“)}， 

.(-il 


(H) 


r B r+l (P k + ^) y ^i( g j + T >;-) 

中 + 1 )\ 今 (P^) r 7 (pyjY 


(12) Q= {(a — fe)log27r — \f log p 

+ L( x k + ^-i) lo g - L{yj + Vj- i^ogyj. 


'/?„ Iil (.v) = 0(x -(m + 1) )means \x m + l R m + l (x)\ is bounded as U1 - 

本 This definition differs slightly from that of Whittaker and Watson [(1943) ， p. 126], who expand 
T (i' hT - \)/{e T - 1). If B*(h) is this second type of polynomial, B ] (h) = B*(h) - B 2r (h) = 
B*Xh) H- (- 1V+ ’S" where B r is the rth Bernoulli number, and B 2r+ \W = B* r+l {h). 


In (14) we have expanded g(0) in the same way we expanded g{t) and have 
collected similar terms. 

Then 


(16) 則⑴ 

= (l-2t7)~ ^exp| D 叫（ 1-2 红 厂 


a- 2 叫卧 

X n (l - + JT^r - ) +R "m^ 

(l-2ity y [l + T,(t)+T 2 (t) + ■■ 


L ^r + ^m + lj 
f ^^(1-2/0 _2 

} 

+uo+d 


where 7 ； ⑴ is the term in the expansion with terms … oj r Sr , Ew, = r; for 
example, 

(17) ^i(0 = (o x [(l~2it)~ l - l], 

(18) T 2 (t) = oj 2 [(1- 2it)~ 2 _ 1] + 2if) _2 _ 2(1 _ 2/r)" 1 + l]. 

In most applications, we will have x k ^c k 6 and y } = dfi, where c k and d } 
will be constant and 6 will vary (i.e., will grow with the sample size). In this 
case if p is chosen so (1 - p)x k and (1 — p)yj .have limits, then 尺 ’ 二 +1 is 
0( 沒 -(m+i)). we collect in (16) all terms w [… (o s /, Lis t = r, because these 
terms are O(0~ r ). 




It will be observed that T r (t) is a polynomial of degree r in (1 - 2") _1 and 
each term of (1 - 2it)~ ^ f T r (t) is a constant tines (1 — 2it)~ ^ for an integral 
v. We know that (1 — 2it)~ is the characteristic function of the ^ 2 -density 
with u degrees of freedom; that is, 

(19) … 卜 ^ 

= /_! 去 0— 2 ");n 

Let 

阶 ) =/ 二士 (1 - 2" ： T ' ⑺ n, 

(20) . 


Then the density of pM is 

(21) f ^Ht)e- i ， 2 dt= E5 r (z)+^ M 

r=0 

= g/ (z) + 叫 [g /+2 (z) -gf{z)] 

+ p2[g/ + 4(Z) ~gf{z)\ 

+ -y-[g/+ 4 (^) -2g /+2 (z) + g/ (z)]| 

+ … +L(Z)+ 砣 + 1 . 

Let 


( 22 ) 


t/ r U) = /\(z) dz, 

K + l = f: 杷 + ' dz. 


The cdf of M is written in terms of the cdf of pM, which is the integral of 


the density ， namely, 

(23) Pr{M<M 0 } 

=Pr{ pM < pM u ] 


=£ U r ( P M 0 )+Rl + i 

r = 0 

= Pr{^/<pM 0 ) + %( Pr U/^ #。} - Pr U/ < 

+ [ o; 2 (Pr{ X f + , ^ PM 0 ) - Pr( Xf ^ P^o)) + ^ ( Pr l - ^ 


-2?r{ X f + 2 ^ P M «) + Pr ( Xf - pM »})] 


-U^pM^+Rl 


The remainder R' m + , is 0(r (m + 1 >)； this last statement can be venf.ed by 

following the remainder terms along. (In fact, to make the Proof r.gorous on^ 

needs to verify that each remainder is of the proper order m a umform 
Sen in many cases it is desirable to choose p so that ^ = 0. In such a case 




UL,r\j(N-q+ 1 - k +Nh)} 

，= K Uf^r[{{^-c , 2 + \-j + Nh)] 


(24) 


and this holds for all h for which the gamma functions exist, including purely 
imaginary h. We let a = b = p ， 


= i (-9 + 1 一灸)， At = K 1 ■ - 


We observe that 


(26) 2 叫 =E 


_ [j[{l-p)N-q 2 +l-k]} 2 -\[(\- P )N-q 2 + l-k] 

yN 

2 -2[{l-p)N-q 2 + \-k]q l +q\ t 9l ] 

^ T 1 

[-2(1 - p) N + 2 <j 2 - 2 + (p + 1) + ?! + 2]. 


To make this zero, we require that 


(27) 

Then 


- KP + 1 + 1 ) 

N 


(28) Pr{-2^1ogA<z} 

= Pr{-/clog [/ p , 9 „ w _ c <z} 

= Pr UA,“} 

+ F( Pr {‘,“u}-Pr{‘y}) 

+ 去 h(Pr{ ^ 1 + 8 <^-Pr{^,<^) 

1 2 (Pr{& i+4 ㈡ - ?r{x p 2 q> <z})] +R}, 
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where 

(29) pN^N~q z -\(p^q l ^l)=n-\(p~q l ^l), 

…、 ^i(p 2 + ^ 2 -5) 

(30) 7i= - 4g - ’ 

(31) y 4 = -^* + [^p 4 + ^>q\ + 10p 2 《? - 50(p 2 + qj) + 159]. 

Since 入 = U》 N q ', n , where n = JV — q, (28) gives Pr{ log U Ptq '， n <z). 

Theorem 8.5.2. The cdf of -k\ogU p ' q '， n is given by (28) with k = n 
_ \{ p -q A + 1), and y 2 and y 4 given by (30) and (31 )， respectively. The 
remainder term L 0(N_ 6 \ 

The coefficient A: = m - |(p — <?! + 1) is known as the Bartlett correction. 
If the first term of (28) is used, the error is of the order AT 2 ; if the second, 
AT 4 ; and if the third, + N^ 6 . The second term is always negative and is 
numerically maximum for z = yj{pq x + ^){P^\) (=P《i + ! ， approximately). 
For p>3,q x >3 } we have y 2 /k 2 <[(p 2 + q\)/kf/%, and the contribution 
of the second term lies between —0.005[{p z ql)/k] 2 and 0. For p 之 3 ， 
q x > 3, we have y A < y\, and the contribution of the third term is numerically 
less than (y 2 /A: 2 ) 2 . A rough rule that maybe followed is that use of the first 
term is accurate to three decimal places if p 1 < k/3. 

As an example of the calculation, consider the case of p = 3, q x = ^ 
N-q 2 ^ 24, and z = 26.0 (the 10% significance point ;^ 2 8 ). In this case 
= 0.048 and the second term is -0.007: y 4 /k 4 = 0.0015 and the third 
term is 一 0.0001. Thus the probability of - 191og t/ 3 6jl8 < 26.0 is 0.893 to 
three decimal places. 

Since 

(32) - [«-|(p-*^ + l)]lOg W p,m,n( a ) = ： C ^ mi n-p + l( a )^ 2 m ( a )> 

the proportional error in approximating the left-hand side by Xpm^ a ^ 

C ,, - 1. The proportional error increases slowly with p and m. 



















million of another function ot U p nKit in terms or ocm ui^nuu- 
Misiants can be adjusted so that the term after the leading one is 
4 . A good approximation is to consider 

1 - U l/S ks-r 
― u l , s — pfn 

im nnH tc — r decrees of freedom, where 



approximation is more 






linear l 




and we shall base test procedures on them. As was shown in Sec 
the hypothesis is P^P*, one can reformulate the hypothesis as 




8.6 OTHER CRITERIA FOR TESTING THE LINEAR HYPOTHESIS 
replacing x a by x a - Moreover, 

( 1 ) Pz a = P lZ ^ + P 2 z^ 

=-^12^2-M ：, ) + (P2 + Py^2^)z^ 

= p lZ f ) + 咤4 2 )， 

where E a z ： (I V a 2), = 0 and L a z ： (l) z* a w ， =A n . 2 . Then P, = P m 

Pla,- 
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and 

々，0 — 0 

0 l 2 ... 0 

KHK=L^ .. ., 

0 0 … 

5 the roots of (2). (See Theorem A.2.2 of the Appendix.) 

Let x u be an obsewaium from A/(P,z* (，) + 2), 

= 0 and L a z* (l) z* (1), =A n . 2 . The only functions of the 
and A u . 2 invariant under the transformations x* = x tJ + 
and i* = Kx are thp. roots nf (2). where G = and 












K~ l (KT\ or G ^K K) 

be written 


⑹ 


Y i l i = tTL = trKHK , 

i = l 

^trHKK^trHi 


This criterion was suggested by 
(1947),(1951). The test procedu 


artlett (1939), and Hotelling 
i hypothesis if (6) is greater 


The general distribution 1 of trHG~ l cannot be characterized as easily as 
that of U p m n . In the case of p = 2, Hotelling (1951) obtained an explicit 
expression for the distribution of trHG' 1 -/, +/ 2 . A slightly different form 
of this distribution is obtained from the density of the two roots l x and /: in 
Chapter 13. It is 

(7) Pr{tri/G- 1 < w} = /,, /(2 + 叫 (川 _ 1 ， ” — U 

一鸟 

r(»r ㈣ 

where I x {a,b) is the incomplete beta function, that is，the integral of ^(y:a,b) 
from 0 to j:. 

Constantine (1966) expressed the density of trHG [ as an infinite series 












































8.7. TESTS OF HYPOTHESES ABOUT MATRICES OF KEGK 
COEFFICIENTS AND CONFIDENCE REGIONS 

8.7.1. Testing Hypotheses 

Suppose we are given a set of vector observations ..., with : 
ing fixed vectors Z\ ， ... ， Zn ， where x a is an observation from MP 
I-* q 、 ^ whprp R. anH 7 ⑴ , have n. 





















(yvi„-/vi n )(yvij"' 

(Pm-Pt)^n.2(P 1 n-Pt) ， (^^)" 


root criterion R, where R is the 


rNl^l =|(P in - Pt)^u-2(Pia _ Pi)' 


referred to the appropriate tables in Appendix 
proach to computing the. criterion. If we let y a 
- considered as an observation from 7V(Az a ， S) 
P* p 2 ). r Fhcn the null hypothesis is \ = 0, 






: so that 

\N± n \ _ , 、 

'n + (Pin - Pi)^n. 2 (P,n - P,)1 ~ Up ' 9 " n(a) - 

the confidence-region statement that satisfies 

_ \Ni n \ _ 

' n + (Pm-P,)^i l -2(P,a-P ； )1 "^ ， " ，，(a)， 







inequality can be used to deterrr 
the region. 


8.7.3. Simultaneous Confidence Intervals Ba 
Lawley-Hotelling Trace 、 


Each test procedure implies a set of confidei 

ing trace criterion can be used to develop si 

for linear combinations of elements of p v 厂 

dence coefficient 1 - a is 


( 12 ) 




simultaneous confidence bounds from (17). From Lemma 5.3.2, we find for 
any vectors a and b 


(18) 

a Ga 11 - 

< ch,[(p in - PO^u^Pia- ^)'G-'\ a'Ga b'A^_b 
^r^M-a'Ga-b'A^b 

with probability 1 - a; the second inequality follows from Theorem A.2.4 of 
the Appendix. Then a set of confidence intervals on all linear combinations 
a'Pji holding with confidence 1 - a is 

(19) a^b - a)-a'Ga-b'A^ <a%b 

+ ^ r P .m.n ( a ) ma7 3a h，A H 1 ：^ . 

The linear combinations are a’p/= Em! 1 :/, A/A . 汀 a \ = ^ a i = °* 
i # 1, and = 1, b h = 0, /i 关 1， the linear combination is simply /3 lt . If 
a x = 1, a- = 0, i# 1, and 卜 = 1 ， b 2 = - L b h = 0, /i ^ 1,2. the linear combi¬ 
nation is /3 n - /3 12 . 







showed that under certain conditions the confidence sets based on the 
maximum root are smallest. [See also Wijsman (1980).] 


8.8. TESTING EQUALITY OF MEANS OF SEVERAL NORMAL 
DISTRIBUTIONS WITH COMMON COVARIANCE MATRIX 


■ 

■ 


“a 


























Then 



is to be compared with the significance point of F ;i 38 . This is significant at 
the 5% level. Our data show that there are differerces between varieties. 
















roots of ⑵ coincide with the nonzero characteristic roots of X'W - YY t )~ 1 X. 
Let V={X, Y,U) and 

(3) M(V) ^X\U-YY'y X X. 

The vector of ordered characteristic roots of M{V) is denoted by 

(4) (V ...， A ,J’ = M M ( K ))， 

where \ 乏 … > A m > 0. Since the inclusion of zero roots (when m >p) 
causes no trouble in the sequel, we assume that the tests depend on 
X(M(K)). . 

The admissibility of these tests can be stated in terms of the geometric 
characteristics of the acceptance regions. Let 

R m < = {\^R m W,>X 1 > - > A m >0}, 

(：> ) J R-={\G J R m |A 1 >0 J ...,A m >0}. 

I* thot ；f a cAt nf camnlp roots Isads to acceDtance of the 








K{M(pV\ + qV2)) 


Figure 8.4. Theorem 8.10.3. 


Theorem 8.10.3. 

(11) \[M(^,+gK 2 )] +q\[M{V 2 )]. 

The proof of Theorem 8.10.3 (Figure 8.4) follows from the pair of 
majorizations 

(12) \[M(pV l +qV 2 )}^ w \[pM(V 1 )+ qM{V 2 )] 

The second majorization in (12) is a special case of the following lemma. 
Lemma 8.10.1. For A and B symmetric, 

(13) X(A^B)> W X{^) +MB). 

Proof. By Corollary A.4.2 of the Appendix, 

k 

(14) max irR^A ^B)R 

/ =1 R'R = t k 

< max tr R'AR + max trR'BR 
R'R = I k R'R = I k 

=E a,.M) + E x i( B ) 

/ =i i=i 

=E{a,</1) + a,(b)}, l ，- ... P . ■ 










)< P M(V l )fqM(V 2 ), 

Y 2 ,U 2 X U.-Y^Xi, U 2 -Y 2 Y^>0, 0<p 

.3 show that 

Y^+qiU.-Y^)]-'. 

A _ Y.n) + q(U 2 - Y 2 Y{)]-\ P X l+ qX 2 ). 






sin's theorem. 




36 ❶ TESTING THE GENERAL LINEAR HYPOTHESIS ； MANOVA 

where © is symmetric. Suppose that {y\oi t y>c} is disjoint from A = 
{V\\{M{V))^A}. We want to show that in this case © is positive semidefi- 
nite. If this were not then 


or sufficiently la 
28) \(M(V)) 




as y • ineretore, VG.A tor sufficiently large y. This is a contradiction. 
Hence ® is positive semidefinite. 

Now let % correspond to where Then /+A© is 

positive definite and A + AO 尹 0 f or sufficiently large A. Hence 叫 + Aw g 
H - H 0 for sufficiently large A. ■ 

The preceding proof was suggested by Charles Stein. 

By Theorem 5.6.5, Theorem 8.10.3 and Lemma 8.10.8 now imply Theorem 

8 . 10 . 2 . 

To obtain Theorem 8.10.1 from Theorem 8.10.2, we use the following 
lemmas. 












(36) 



origin. Letf(x) 之 0 be a function swc/i 从如 （ i) f(x) - /( -x\ (ii) Ul/U’ ） >u]= 
K u is convex for every w (0 < w < °°), and (iii) j E f(x) dx < x . Then 

(37) jj : (x + ky)dx> j^f(x+y)dx 
for 0 <k <1. 

The proof of Theorem 8.10.5 is based on the following lemma. 

Lemma 8.10.12. Let E，F be convex and symmetric about the origin. Then 

(38) V{(E + ky) nF) >V{(E+y)nF), 

w h ere 0<k<l and V denotes the n-dimensional volume. 

Proof. Consider the set o ； (£+_y) + (1 - a)(£ —j) = + (1 - a)£ + 

(2a- l)y which consists of points aU +y) + (1 - aXz-y) with x ， z^E. 
Let a 0 = (fc+ 1)/2, so that 2a 0 -l=k. Then by convexity of E we have 

(39) a 0 (E+y) + {l-^){E-y)^E+ky. 

Hence by convexity of F 

a 0 [(£+^) nF] + O-a 0 ){{E-y)nF] c(E + ky)nF 




















and 

(40) K{a 0 [(£+>') nf] + (1 - ao)[( £ -^) nF]} < K{(£ +/cy) nF}. 

Now by the Brunn-Minkowski inequality [e.g., Bonnesen and Fenchel (1948 )， 
Section 48], we have 

(41) .K 1/n {a 0 [(£+ ： y) nf] + (1 - a 0 )[(£ ： — ： y) nf]} 

> a d V 1/n {(E+y) nf} + (1 - a 0 )K 1 /"{(£-^)nF} 

= ao K 1/n {(£+ 3 -)nF} +(1- a^V'^K-E+y) n(-F)} 

= K ,/n {(£+^) nF}. 

The last equality follows from the symmetry of E and F. 鼸 

Proof of Theorem 8.10.5. Let 

(42) H(u) = V{{E + ky)nK„}, 

(43) H*(u) = V{(E+y)nK u ). 

Then 



(45) 


BlB'=I p , FF'^I m , 

BSF' = (di ， o), p<m, 



where D v = diag( 〜 … ， v). 

Proof. We prove this for ihe case p <m and v p > 0. Other cases can be 
proved similarly. By Theorem A.2.2 of the Appendix there is a matrix B such 
that 

(47) BSSB^D V . 

Let 

(48) F,=D；^BS (pxm). 

Then 

(49) Ad. 

Let F' = (FJ, F^) be a full mXm orthogonal matrix. Then 

(50) BEF^DIF^F^O 
and 


(51) BBF'=B«(F[,F' 2 )=BS(E'B'D ； KF' 2 ) = (DI ， 0). ■ 

Now let 

(52) U = BXF\ V^=BZ. 

Then the columns of U,V are independently normally distributed with 
covariance matrix I and means when p<m 


( 53 ) 


su=(Di,a), 

忒 F=0. 





roots 


natural to use /-, which ( 
theorem is given by Das 


Theorem 8.10.6. If, 



other column vectors of l 
each 


Proof. Since VV is ui 
by — 1, the acceptance 
column vectors of U. Nc 

(54) f(U,V) 

= {27r)-'^ n+m)p 


Apnlving Theorem 8.10 」 


Corollary 8.10.3. If 













letely specified. In Section 8.3 a more general hypothesis 
P partitioned as P = (P 1? P 2 ). However, as shown in that 
sformation (4)，the hypothesis P! = Pt can be reduced to a 
bnn (1) above. 

S= L (x a -Bz a )(x a -Bz a y=Ni a , 

a=l 

- P)' 

Under the conditions of Theorem 8.11.1 the limiting distri- 





distribution 
















kll.2 agrees with the first term of the asymptotic expa 
:n Theore m 8.5.2 for sampling from a normal dist] 
confidence procedures discussed in Sections 8.3 and 8.^ 
this y-distribution. 











E (z a -z)(z a -z) , 



,1.8311 - 

-0.3589 

-0.0125 

-0.0244 

1.6379 

0.5057 


-0.3589 . 

8.8102 

-0.3469 

0.0352 

0.7920 

0.2173 


-0.0125 - 

-0.3469 

1.5818 

— 0.0415 

-1.4278 

-0.4753 


-0.0244 

0.0352 

-0.0415 

0.0258 

0.0043 

0.0154 


1.6379 

0.7920 

-1.4278 

0.0043 

3.7248 

0.9120 


. 0.5057 

0.2173 

-0.4753 

0.0154 

0.9120 

0.3828 


1 o 

0 

0 

0 

0 

0 




1 0.2501 

2.6691' 






-1.5136 

-2.0617 



N 



| 0.5007 

-0.9503 



L (z a ~z)(x a 


-0.0421 

-0.0187 



a= 1 

i 


-0.1914 

3.4020 




-0.1586 1.1663 

1 0 0 / 

(a) Estimate the regression of x 1 and j： 2 on z lt z s , z 6t and z 7 . 

(b) Estimate the regression on all seven variables. 

(c) Test the hypothesis that the regression on z 2t z 3 , and z 4 is 0. 

8.6. (Sec. 8.3) Let q = 2 } z ]o = w a (scalar), z 2a = 1. Show that the ^/-stat 

testing the hypothesis Pj = 0 is a monotonic function of a : T(statistic, : 
the r 2 -statistic in a simple form. (See Problem 5.1.) ’ 

8.7. (Sec. 8.3) Let z qa = \, let q 2 = 1, and let 

= /,； = 1 ，…， q ': 

Prove that 

( 自 in - Pi)(^n ~A I 2 A^A 21 )(p ia - PJ =( 自 in - f 


8.8. (Sec. 8.3) Let q l = q 2 . How do you test the hypothesis = p 2 ? 

8.9. (Sec. 8.3) Prove 

P'«= -A l2 A i2 'z[ 

=(C「C 2 ^22^2l)(^ll ~^l2 A 22 A 2l) ' - 






with means 0 and covariance 
rial) density proportional to 

i/+cc'r 』 

(a) Show the measures are fi 

(b) Show that the inequality 


Hence the likelihood ratio tesl 
)Admissibility of 












CHAPTER 9 


Testing Independence of 
Sets of Variates 


9.1. INTRODUCTION 

In this section we divide a set of p variates with a joint normal distribution 
into q subsets and ask whether the q subsets are mutually independent; this 
is equivalent to testing the hypothesis that each variable in one subset is 
uncorrelated with each variable in the others. We find the likelihood ratio 
criterion for this hypothesis, the moments of the criterion under the null 
hypothesis, some particular distributions, and an asymptotic expansion of the 
distribution. 

The likelihood ratio criterion is invariant under linear transformations 
within sets; another such criterion is developed. Alternative test procedures 
are step-down procedures, which are not invariant, but are flexible. In the 
case of two sets, independence of the two sets is equivalent to the regression 
of one on the other being 0; the criteria for Chapter 8 are available. Some 
optimal properties of the likelihood ratio test are treated. 


9.2. THE LIKELIHOOD RATIO CRITERION FOR TESTING 
INDEPENDENCE OF SETS OF VARIATES 

Let the p-component vector X be distributed according to Mm-, S). We 
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partitioned into submatrices Then 

( 25) 命 E ⑴ 



and A* = CAC'. Thus 


(26) 


U*l \CAC'\ 

ni# _ nic,/i,,c;i 

ICI-l/tl-lCM \A\ 

nic,M/i„.|-ic;i ~ ni/i,i _ 


for I C| = n|C,l. Thus the test is invariant with respect to linear transforma¬ 
tions within each set. 

Narain (1950) showed that the test based on V is strictly unbiased; that is, 
the probability of rejecting the null hypothesis is greater than the significance 
level if the hypothesis is not true. [See also Daly (1940).] 


9.3. THE DISTRIBUTION OF THE LIKELIHOOD RATIO CRITERION 
WHEN THE NULL HYPOTHESIS IS TRUE 

9*3.1. Characterization of the Distribution 









which is i his is the 

conditioning vector, an( 
covariance matrix. 

Theorem 93.2. The 

distribution ofV 2 V 3 ••- V t 










9.4. AN ASYMPTOTIC EXPANSION OF THE DISTRIBUTION OF THE 
LIKELIHOOD RATIO CRITERION 



v _ y 2 N k y 2 /k 2 

12.592 ^ 15 ^ 00033 

18.307 f 15 f 0.0142 

24-996 W 15 f 00393 

_ 16 f 0.0331 


have 


卜 士 p ( p - 1)， 



72 =” ( 2 〆 _ 2 P - 13 )， 

n( p - 

73 = 3240 ； (P~ 2 )( 2 P~ l )(P + ^)； 

given by Box (1949). If p i ^2{p = 2q) 

f=2q(q-l) t 


r 2 =^l(8^-8 9 -7). 

ives an indication of the order of approximation of (6) for 
case v is chosen so that the first term is 0.95. 

; approximate distributions given in Sections 8.5.3 and 8.5.4 are 
also Nagao (1973c).] 


Second 

Term 

-0.0007 

-0.0021 

-0.0043 

-0.0036 


CRITERIA 



/, that is, C 1 = and let B Q be the matrix with B n as the 

block and O’s as off-diagonal blocks. Then B q A q Bq = I and 


0 




^22-^21^11 


’Ui2"’22 

0 


B 22 A 2q B， qq 


B qq A q2 B' 22 - 0 j 

x is invariant with respect to transformations (24) of Section 9.2 
on A. A different choice of B it amounts to multiplying (1) on the 
and on the right by Q f Qj where Q 0 is a matrix with orthogonal 
ocks and off-diagonal blocks of 0’s. A test procedure should reject 
ypothesis if some measure of the numerical values of the elements 
too large. The likelihood ratio criterion is the N/2 power oi 
0 )B' 0 + I\=\B 0 AB' 0 \. / 

r measure, suggested by Nagao (1973a), is 


-A 0 )B' 0 ] 2 ^ ^tr[(A -4。) V] 2 = H/Wo 1 -/) 2 
= 5 E 

ij=^ 


this measure is the average of the Bartlett- 
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The sequence of subvectors and the sequence of components within each 
subvector is at the discretion of the investigator. 

The criterion K for testing H x is V t = n^ =i X ik ，and criterion for the null 
hypothesis H is 

( 8 ) v=uv i =n h^ ik . 

/ = 2 i = 2 A = 1 

These are the random variables described in Theorem 9.3.3. 


9.7. AN EXAMPLE 


We take the following example from an industrial time study [Abruzzi 

(1950)]. The purpose of the study was to investigate the length of time taken 

by various operators in a garment factory to do several elements of a pressing 

operation. The entire pressing operation was divided into the following six 

elements: 

1. Pick up and position garment. 

2. Press and repress short dart. 

3. Reposition garment on ironing board. 

4. Press three-qi^rters of length of long dart. 

5. Press balance of long dart. 

6. Hang garment on rack. 


In this case x a is the vector of measurements on individual a. The compo¬ 
nent x ia is the time taken to do the ith element of the operation. N is 76. 
The data (in seconds) are summarized in the sample mean vector and 
covariance matrix: 


( 1 ) 


9.47 
25.56 
13.25 
31.44 
27.29 
\ 8.80, 


(2.57 

0.85 

1.56 

1.79 

1.33 

0.42' 

1 0.85 

37.00 

3.34 

13.47 

7.59 

0.52 

1.50 

3.34 

8.44 

5.77 

2.00 

0.50 

1.79 

13.47 

5.77 

34.01 

10.50 

1.77 

1.33 

7.59 

2.00 

10.50 

23.01 

3.43 

U.42 

0.52 

0.50 

1.77 

3.43 

4.59 


( 2 ) 
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The sample standard deviations are (1.604,6.041,2.903,5.832,4.798 ? 2.141). 
The sample correlation matrix is 


f 1.000 
0.088 

0.088 

0.334 

0.191 

0.173 

0.123 \ 

1.000 

0.186 

0.384 

0.262 

0.040 

I 0.334 

0.186 

1.000 

0.343 

0.144 

0.080 

0.191 

0.384 

0.343 

1.000 

0.375 

0.142 

0.173 

0.262 

0.144 

0.375 

1.000 

0.334 

\ 0.123 

0.040 

0.080 

0.142 

0.334 

1.000, 


The investigators are interested in testing the hypothesis that the six 
variates are mutually independent. It often happens in time studies that a 
new operation is proposed in which the elements are combined in a different 
way the new operation may use some of the elements several times and some 
elements may be omitted. If the times for the different elements in the 
operation for which data are available are independent, it may reasonably be 
assumed that they will be independent in a new operation. Then the 
distribution of time for the new operation can be estimated by using the 
means and variances of the individual items. 

In this problem the cr.terion VhV=\R\ = 0.472. Since the sample s.ze is 
large we can use asymptotic theory: 学 ， /= 15, and -fclogK=54.L 

Since the significance point for the ^-distribution with 15 degrees of 
freedom is 30.6 at the 0.01 significance level, we find the result significant^ 
We reject the hypothesis of independence; we cannot consider the times of 
the elements independent. 


9.8. THE CASE OF TWO SETS OF VARIATES 


In the case of two sets of variates (^ = 2), the random vector X the 
observation vector ^ a , the mean vector ^ and the covariance matrix 2 are 


partitioned as follows: 


⑴ 




x ( 2 十 




U (2 T 

\S 21 s 22 



The null hypothesis of independence specifies that X 1： = 0, that is, that 


I is 


of the form 

(2) 



0 

































To define Tl 0 let X have the form of ⑹ of Section 9.2. Let 

(6) + + i = 1 ， ." ， 9, 

where the p r component random vector K (,) has density proportional to 
〔 1 + f ,(0V 0 r and the conditional distribution of Y ； given F (1) = r (0 is 
M 0，（1 + 一 〜⑴)/#]， and let (V v Y,\...,(V q ,Y q ) be mutually independent. 
Then the denominator of (1) is 

(7) ft const^,r = exp[ ~^rA, i + Nx^'x^)} 

/=1 

=const| fl U„l ' - j exp[ - i(tr A + iVFi)]. 

The left-hand side of (1) is then proportional to the square root of 

nu\Au\/\^\- ■ 

This proof has been adapted from that of Kiefer and Schwartz (1965). 

9.10. MONOTONICITY OF POWER FUNCTIONS OF TESTS OF 
INDEPENDENCE OF SETS 

L et z a = [Z^ n， , a = 1， be distributed according to 



⑴ 


/- AAO = N(&y a ，I - R 2 \ Then x* =(/ pi - R 2 )~ ^x a is distributed 
according to N(My a , f p ) where 

(4) Z^diagK ，...，〜■)， 

W(l-p , 2 ) 5 ， i = 

Note that 5, 2 is a characteristic root of X 12 X 22 1 ^ 2 i^n 1 2 » where X n . 2 = 5 n 
— ^ 12^22 ^21- 

Invariant tests depend only on the (sample) canonical correlation coeffi¬ 
cients r ( -= where 

(5) c; = A,.[(n* , ) _1 (x*r)(iY , ) _1 (Kr ：, )]. 

Let 

S A =A r *r(iT , )" 1 Kt* ， ) 

( 6 ) s e =A■*A■* , -s A =A■*[/-r(yT , ) -1 y]A■* , . 






Then 


⑺ = 1^7： * 

Now given y，the problem reduces to the MANOVA problem and we can 
apply Theorem 8.10.6 as follows. There is an orthogonal transformation 
(Section 8.3.3) that carries X* to (U ， V) such that S h = UU\ S e = W\ 
U = (u x ,...,u p ^ V is /?, X (n -p 2 ), has the distribution /), 

i= 1,...,^! (e, being the ith column of /)，and M0,/) ， i=p x + 
and the columns of V are independently distributed according to MO, /). 
Then c v ...,c Pi are the characteristic roots of UUWV'y 1 , and their distri¬ 
bution depends on the characteristic roots of MYY r M\ say, Tf ， .,. ， T》'. Now 
from Theorem 8.10.6, we obtain the following lemma. 

Lemma 9.10.2. If the acceptance region of an invariant test is convex in 
each column of U，given V and the other columns of U, then the conditional 
power giuen Y increases in each characteristic root ojMYY'M'. 

Lemma 9.10.3. If A >B, then A,(^)> A f (B). 

Proof. By the minimax property of the characteristic roots [see, e.g., 
Courant and Hilbert (1953)], 

(8) A ( (A) = max min ^ max mm = A,(B ) ， 

where S t ranges over (-dimensional subspaces. ■ 

Now Lemma 9.10.3 applied to MYY'M' shows that for every j, is an 
increasing function of = p f /(l - pf)^ and hence of p,、Since the marginal 
distribution of Y does not depend on the p/s, by taking the unconditional 
power we obtain the following theorem. 

Theorem 9.10.1. An inuariant test for which the acceptance region is convex 
in each column of U for each set of fixed V and other columns of U has a power 
function that is monotonically increasing in each p t . 

9.11. ELLIPTICALLY CONTOURED DISTRIBUTIONS 
9.11.1. Observations EHiptically Contoured 

Let x v … ， x N be N observations on a random vector X with density 

(1) |A| ]g[(x-v)’ 八 - 丨 ( 文 - v )]， 
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where SR A < oc and 尺 2 = (x - v )， 八 _1 (x - v). Then SX = v and /( 叉一 
v)(A：_ v)' = 2 = (^ 2 /p)A.Let 

(2) 卜 ^ E 

a=l «=1 

Then 

(3) v^vec(S_2) ^ N[0 ,(k+ 1)(/〆® i) + k vcc i(vcc i)'], 

where 1 + k =p <^i? 4 /[(p + 2)( <^i? 2 ) 2 ]. 

The likelihood ratio criterion for testing the null hypothesis = 0, i 
is the N/2th power of t/= U q i= i v ^ where V i is the [/-criterion for testing the 
null hypothesis 2 U = 0.2 , ■一 = 0 and is given by (1) and (6) of Section 






lave the density g(tr YY r ). The matrix Y is vector-spheri 
> spherical and has the stochastic representation vec 
re R 2 = (vec y)' vec Y=trYY , and vec U pxN has the unif 
he unit sphere (vec U pxN ) r yQC U pxN = 1. (We use the n 
tinauish f om IJ uniform on the snace IW = f \ 


ctenstic roots 1 and one root U. ihen A =XMX and 
le likelihood function is 

Vr n/ V{trA-'[/J + /V(x-v)(x-v) ( ]}. 

1 the vector x are sufficient statistics, and the likelihood 
the hypothesis H is {\A\/Y\^ ={ \A r \) N/1 , the same as for 
derson and Fang (1990b). 

.Let f(X) be a vector-valued function of X (p XN) such 














CHAPTER 10 


Testing Hypotheses of Equality 
of Covariance Matrices and 
Equality of Mean Vectors and 
Covariance Matrices 


10.1. INTRODUCTION 

In this chapter we study the problems of testing hypotheses of equality of 
covariance matrices and equality of both covariance matrices and mean 
vectors. In each case (except one) the problem and tests considered are 
multivariate generalizations of a univariate problem and test. Many of the 
tests are likelihood ratio tests or modifications of likelihood ratio tests. 
Invariance considerations lead to other test procedures. 

First, we consider equality of covariance matrices and equality of covari- 
°nce matrices and mean vectors of several populations without specifying the 
)mmon covariance matrix and mean 


















The application of the tests for elliptically contoured distributions 
Seated in Section 10.11. 


10.2. CRITERIA FOR TESTING EQUALITY OF SEVERAL 
COVARIANCE MATRICES 

In this section we study several normal distributions and consider using a 
" 5, one from each population, to test the hypothesis that 

matrices of these populations are equal. Let x { J\ a = 1 ，…， 
U be an observation from the gth population ⑷， 2 g ). We w 
hypothesis 







He argues that if N v say, is small, A x is given t 
other effects may be missed. Perlman (1980) has s 
V\ is unbiased. 

If one assumes 

(15) = 

where z { a 8) consists of k g components, and if oi 
defining 

(16) 尖 =E (4 S) - 民 4 s) )(x„- 

a=l 

one uses (10) with n R = N g - k^. 

The statistical problem (parameter space fl 


\ ab for H ab are uniquely defined for the 


入 a 入 fc. 





























The hih moment of W can be found from its representation in Theorem 
10.4.3. We have 

(16) 

= n n /吨"' + ... ，、 h 

A-( 二 中 (》, + … +〜_ 1 + i-/) + y(A^_.U] 

_ M \/-i 「[Mi + … +,, . c -i +1 - 0 ] 「[办 s + 1 - ; ’)] 

r[4(/；, + i-/ + %/i)]r[K»i + … +« e ) - i +1] 

r [士 (〜+ … +n g ) + */!(〜 + … +N g ) + 1 - i] 

^ 『[!('!! + — +n s ) + 去 /i(M + … +N g ) + 1 - i] 

,U r[\{ n , + -+n g ) + l-i\ 

中 … i+ ... + 〜 +1 —,.)] \ 

r [ 士 (《, + … +〃. v + l - ；) + |/!(乂 + … +N g )\ I 

r^( n + \-i + hN)]r[{(N-i)} 

r[{(« + i-0]r[l(w + ^-0j 

A f A rfU^ + ^-OM rU(w-0] 

= M\ S U r[i(Af,- ； )] I r[Kw + ^-o] 

r,(^) ^ r p [iK + n)l 

- r ； (i« + \hN) ^ 明 〜）- 

We summarize in the following theorem: 

Theorem 10.4.4. Let V x be the criterion defined by (10) of Section 10.2 for 

一 n pi e 














1 nc UaMC UUlllU^l ui “wwww ‘“ -- J 厶 r '•» ■* 

of Section 8.5 with p k = H~ P )x k and e j = a - P )y r To make ^ = 0, we 
take 

/ f 1 1 \ 2p 2 + 9p+ll 

⑽ P =l - \^W g ^N)6(^l)(p + 3)- 

Then 

( ⑴销 ! H-•+” ㈣ - 6 (i _ p) 2 ( H. 

The asymptotic expansion of the distribution of _2p log 入 is 
(12) Pr{-2plog A<z} 

=Pr{ <z) + ^ 2 [ Pr {^A-> ^ z ) _ P r {^/ Sz}] +0(«- 3 ). 

Box (1949) considered the case of A* in considerable detail. In addition to 
this expansion he considered the use of (13) of Section 8.6. He also gave an 
F-approximation. 

As an example, we use one given by E. S. Pearson and Wilks (1933). The 


10.6 THE CASE OF TWO POPULATIONS 
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and the sum of these is 


(14) 



636.165 

1697.52 


1697.52 \ 
7653.44 j* 


The - log AJ is 5,399. To use the asymptotic expansion we find p = 152/165 
= 0.9212 and oj 2 = 0.0022. Since is small, we can consider 一 2p log Af as 
X 2 with 12 degrees of freedom. Our observed criterion, therefore, is clearly 
not significant. 

Table B.5 [due to Korin (1969)] gives 5% significance points for - 2 log A* 
for N^= ••- =N q for various q, small values of N g , and p = 2(1)6. 

The limiting distribution of the criterion (19) of Section 10.1 is also xf- An 

asymptotic expansion of the distribution was given by Nagao (1973b) to terms 

of order 1/n involving ^ 2 -distiibutions with /, /+2, f+4, and f+6 
















(Note {l/[q(M~ 1)]}G and {l/q)H maximize the likelihood without regard 
to 0 being positive definite.) Let l* = /,. if /,. > l, a nd let l* = 1 if /,. < l. 
Then the likelihood ratio criterion for testing the hypothesis' 0 = 0 against 
the alternative ® positive semidefinite and 0 o is 

(14) M^ m pY\ -— ~ : — = M^ qMk n - 

where k is the number of roots of (13) greater than 1. [See Anderson (1946b) 
(1984a)’ (1989a), Morris and Olkin (1964), and Klotz and Putter (1969).] ’ 


10.7. TESTING THE HYPOTHESIS THAT A COVARIANCE MATRIX IS 
PROPORTIONAL TO A GIVEN MATRIX; THE SPHERICITY TEST 


10.7.1. The Hypothesis 
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In the canonical form the hypotnesis is a comomauun ui ui^ 

7^:2 is diagonal or the components of X are independent and H 2 : the 
diagonal elements of 2 are equal given that 2 is diagonal or the variances of 

the components of X are equal given that the components are independent. 

Thus by Lemma 10.3.1 the likelihood ratio criterion A for H is the product of 
the criterion ^ for H x and A 2 for H 2 . From Section 9.2 we see that the 
criterion for H x is 


(4) 


naf 


㈣ ， 


(5) 


A = E (心 _ 无 )（〜— 无) ’ = 


and r" = a"/ 々 “a". We use the results of Section 10.2 to obtain A 2 by 
considering the ith component of as the ath observation from the ith 


- 30 ( 1 - 30 ' 


50(1 - 50 ' 
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This is of the form of (1), Section 8.5, with 

a =p ， x k = \n, = j(l-k), k= 

( 20 ) 

b = h y x = 2 n P^ = °- 

Thus the expansion of Section 8.5 is vaKd with /= + 1) - 1. To make 


( 21 ) 

Then 

( 22 ) 


2p 2 +p + 2 

6pn 


(p + 2)(p - l)(p ~ 2)(2p 3 + 6p 2 + 3p + 2) 


2BBp 2 n 2 p 2 


Thus the cdf of W is found from 


(23) Pr{-2plogZ<z} 

= Pr{-nplog^^z) / 

= Pr{^/<z} + oj 2 (Pr{^/ +4 <z}~?T{xf <z})+G(n~ 3 ). 


■ investigator 
































dnu A p are tne largest and smallest characteristic roots of 2. Then 

( 32 ) ^<^ 2 )< 1 /, 

is a confidence interval for all characteristic roots of 2 with confidence at 
least 1- s. In Section 11.6 we give tighter bounds on A(X) with exact 
confidence. 
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10.9. TESTING THE HYPOTHESIS THAT A MEAN VECTOR AND 
A COVARIANCE MATRIX ARE EQUAL TO A GIVEN VECTOR 
AND MATRIX 

In Chapter 3 we pointed out that if 平 is known , (歹 一 ”„)’ 平 0 _1 (> - v 0 ) is 
suitable for testing 

(1) // 2 : v = v 0 , given 平 = 平 0 . 

Now let us combine H l of Section 10.8 and H 2 , and test 

( 2 ) H:v^v {)1 平 = 平 0 ， 

on the basis of a sample h ，…， JV from N(vD. 

Let 

(3) 尤 =C (卜 v 0 )， 
where 

(4) C^ a C'=I. 

Then x v ...,x N constitutes a sample from M> ， X )， and the hypothesis is 

(5) W :l x = 0, 1 = 1. 

The likelihood ratio criterion for H 2 : l i = 0, given S=/, is 

( 6 ) 入 2 = e ，' 

The likelihood ratio criterion for H is (by Lemma 10.3.1) 

{ e . .. ikt 


⑺ 
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the hth moment of 入 is 


( 9 ) 


S\ h = S\ l {Sk\ = 


I 2e y pNh 1 +Nh)] 

~ 1 ^ 0 ~ 


under the null hypothesis. Then 

(10) -21og A= -21og A! -21og A 2 

has asymptotically the ^^distribution with / = 〆/? + 1)/2+p degrees of 
freedom. In fact, an asymptotic expansion of the distribution [Davis (1971)] of 
-2p log A is 

(11) Pr{-2plogA<z} 


where 

( 12 ) 

(13) 


=Pr{ 々 2 <z} + -^(Pr{^/ + 4 y} _ PrU/ u}) + 0( AT 3 ) ， 


_ 2p 2 + 9/J-11 
1 一 ] iV(> + 3 ) 、 

p(2p 4 4- 18p 3 + 49p 2 + 36p - 13) 
288(p-3) 


Nagarsenker and Pillai (1974) used the moments to derive exact distributions 
and tabulated the 5% and 1% significance points for p = 2(1)6 and N = 
4(1)20(2)40(5)100. 

Now let us return to the observations y^..^y N . Then 

(14) E(f v o)’ c ’ c (L _ v o) 

a a 

= trA+Nx , x 

= It(B^q 1 ) +N(y -v Q y^Q l (y-v Q ) 

and 

(15) \A\ = \B^ l \. 

Theorem 10.9.1. Given the p-component observation vectors y^... y y s from 
Mv,^X (he likelihood ratio criterion for testing the hypothesis H:v = v 0 . 





where the d x r„ mainx - b 6 

n =/v -l,the ^-component random vectorhas the conditional normal 
distribution with mean 0 and covariance matrix (V^)U r - C//, 

C O-CJ- 1 given C r and (C 1； /»),-,(C , ： y^>) are independently^d«- 
trlhut^H As we shall see, we need to choose Q1,,tahle 


,ir»tc»fTP»rc r. 


that the integral of |/+C g C；| ^ is tinite 11 

tor of the Bayes ratio is 

(3) const Q/:.. jjI + C g C' g \^ 

■exp|-i + 

.(/ + c g c ； )[xi*>-(/ + c x q)- 1 c^]| 

• |/ + c K q,r^|/-q,(/ + q,c;) _1 c g | : 


10.10 ADMISSIBILITY OF TESTS 


const n j* 二 … 广 ① exp — * ix^(I + C g C g )x^ 


-2y« ), C ； E dy ⑻ dC g 


■- const Y[ exp I - ^ 




/： 


•exp {- \N g (y^- Cp ⑷ )’(y«) - C' g x^) - |tr C s A g C s } dy^dC g 
=const n ^p{-^[trA g +N g x {8) 'x i8) ] ^\A g \~^ 8 . 

Under the null hypothesis let 


(4) 


= [(/ + CC , )" 1 Cy ( * , ,(/ + CC , )"'] 
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1， ■…仏 and by ⑴ of Section 10.2. Let S s = (l/n g )A g , where n g = 
N k - 1, and 5 = (l/n)A, where n = L q g={ n g . 

' Since the likelihood ratio criterion \ is invariant under the transforma¬ 
tion x is) = CX {S) 4- v u) , under the null hypothesis we can take = *" = 

=i and v ⑴ ="• = v tg) = 0. Then 


(2) -21og A! = - | E ^ - AHoglijj 

= 一 1 E N g log| I + (i s n - /) I - ^ lQ g| ^ - 7 ) l] 

==-I E 斗 1 ■( 玄 s n - 功 - I tr (^«n _/ ) + 0 />(^« -3 )] 

-/v[tr(i„-/)-itr(i„-/) 2 + 0,(N- 3 )] 

= iL/V s tr(i gn -/) 2 -|/Vtr(i„-/) 2 -HO,(/V- 3 ) 

5 =1 

- 3N[vec(2 w -l)\ vec( 2 w - / ) +^(/V' 3 ). 


Bv Theorem 3.6.2 

(3) /A4vec(S fi -I p )^N[Q,(k+ 1 )( V + &) + « vec / p (vec /,,)，], 

and / ； C S 5 = N s t g(l , g = 1 , ... ， are independent. Let N g = k g N, g=\,.. ： ,q, 
Yf[ =x k, = \, and let N (x>. In terms of this asymptotic theory the limiting 
distribution of ycc(S x vec(5 9 -/) is the same as the distribution of 

, 1、 _/ ^« /> ^ r> r> *«.u ^ Q Q ¥*or\1o^<=»H Hv ( W 4 - 1V / ? 4 - 



Let y g = vec(t gfl -/)and y= vecCS,-/). Then y = Z\^{N g /N)y g and 

( 4 ) ~^Og\ = {j ： N s (y s -y)'(y g ~-y) 

g=l 

= LN g (y g -y)(y g -yy 

= l tr | E N s y s y s ~^y 

L&t Qbe a qXq orthogonal matrix with last column (^N'/N, …， JN /N)'. 
Define 9 

(5) . ^[W g y^Q. 

Then w q = -/Ny and 

( 6 ) 1 N gh % -啼 = E 

g=i 

In these terms 

9-1 

( 7 ) ~ 2 log A t = I £ w f g w g 4 - O p (N_ 3 ), 





KiH«muiB(lJli1llliTSnSlffinnSTlliT9alBn SmnSRfWiV9RminRKI9a!r!V!VTiKKB!fni 


on nonnormality. 





Proof. Let (ly^X^- n g I) = W g , g= 1,2. Then 


+ ^ r，_ 占’ 卜 x ^ f W2+ ° M), 


V^i+MA +^ 2 -(«i+ n 2 )l] = y/^^7 % + Wl+ 0/7 ⑴. 


(11) 


n x +n 2 1 y n l + n 2 


By application of Lemma 10.11.1 in succession to A x and +A 2 , to 
A y +A 2 and A 1 +A 2 +A 3> etc., we establish that A 1 A~ l J A 2 A~ 1 ,,., f A q A~ l 
are independent of A =A l + ••- +A q . It follows that V l and V 2 are asymptot¬ 
ically independent. 

Theorem 10.11.2. When = … =and jx (1) = == jx (g) , 


(12) -21og W= -21og \.-2\og A 2 


[( k + 1 ) +PK/2] + Xp^iy 


10.11 ELLIPTICALLY CONTOURED DISTRIBUTIONS 4s3 

The hypothesis of sphericity is that I = a 2 / (or A = A/). The criteron is 
入丄 ， where 

( 13 ) Ai = (n&r ， 入 2 = 

The first factor is the criteron for independence of the components of X, and 
the second is that the variances of the components are equal. For the first we 
> set q =p and p, = 1 in Theorem 9.10, and for the second we set q=p and 
p = \. Thus 

(14) -21og(A(A 2 ) (1 + K)xp-i + 1(3k + 2)^_,. 

10.11.2. Elliptically Contoured Matrix Distributions 
Consider the density 

(15) fl I A s l - w */ 2 g[ t r E A- (^ ( *' - V^)[X^ - 

= niA,|- w « /2 g|trEA ； U g+ ⑴)\(”)| 

In this density (A gi x g \ g= 1， …， i is a sufficient set of statistics, and the 
likelihood ratio criterion is (8) of Section 10.2, the same as for normality 
[Anderson and Fang (1990b)]. 

t Theorem 10.11.3. Let f(X) be a vector-valued function of X = 


(W 


. . 



















CHAPTER 11 


Principal Components 


11.1. INTRODUCTION 

Principal components are linear combinations of random or statistical vari¬ 
ables which have special properties in terms of variances. For example, the 
first principal component is the normalized linear combination (the sum of 
squares of the coefficients being one) with maximum variance. In effect, 

transforming the original vector variable to the vector of principal compo¬ 

nents amounts to a rotation of coordinate axes to a new coordinate system 
that has inherent statistical properties. This choosing of a coordinate system 
is to be contrasted with the many problems treated previously where the 
coordinate system is irrelevant. 

The principal components turn out to be the characteristic vectors of the 
covariance matrix. Thus the study of principal components can be considered 
as putting into statistical terms the usual developments of characteristic roots 
and vectors (for positive semidefinite matrices). 

From the point of view of statistical theory, the set of principal compo¬ 
nents yields a convenient set of coordinates, and the accompanying variances 
of the components characterize their statistical properties. In statistical 
practice, the method of principal components is used to find the linear 

combinations with large variance. In many exploratory studies the number of 

variables under consideration is too large to handle. Since it is the deviations 
in these studies that are of interest, a way of reducing the number of 
variables to be treated is to discard the linear combinations which have small 
variances and study only those with large variances. For example, a physical 
anthropologist may make dozens of measurements of lengths and breadths of 
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(Section 11.6); exact confidence bounds are found for the characteristic roots 
of a covariance matrix. In Section 11.7 we consider other tests of hypotheses 
about these roots. 


11.2. DEFINITION OF PRINCIPAL COMPONENTS 
IN THE POPULATION 


Suppose the random vector X of p components has the covariance matrix S. 
Since we shall be interested only in variances and covariances in this chapter, 



be given to the principal components. 

In the following treatment we shall not use the usual theory of characteris¬ 
tic roots and vectors; as a matter of fact, that theory will be derived implicitly. 
The treatment will include the cases where 2 is singular (i.e.，positive 
semidefinite) and where S has multiple roots. 


/jorp (1) = p , 

orthogonal t< 





























(19) 


is-A/i = ipns-A/Mpi 

= IP'SP-APTI =|A-A/| 

= n(A (/) - a ) 

we see that the roots of (19) are the diagonal elements of A; that is, 
入 (1) = A l ， 入 (2) = A 2 ,..., \ {p) = \ p . 

We have proved the following theorem: 

Theorem 11.2.1. Let the p~component random vector X have iX = 0 and 
~ S. Then there exists an orthogonal linear transformation 

( 20 ) U=PX 

^ch that the covariance matrix of U is SVV' = /L and 
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irresponding one of Y in ounces. The covariance matrix of Y is = 
A = = say. Then analysis of Y into principal components 






Let A 7 = a; that is, y Y^y ^X^a X, Then (31) results from maximizing 
^{a'X) 2 = a Xct relative to a ， A_ 2 a. This last quadratic form is a weighted 
sum of squares, the weights being the diagonal elements of A 2 . 

It might be noted that if A 一 2 is taken to be the matrix 

0 ' 

0 


113. MAXIMUM LIKELIHOOD ESTIMATORS OF THE PRINCIPAL 
COMPONENTS AND THEIR VARIANCES 

A primary problem of statistical inference in principal component analysis is 
to estimate the vectors p ⑴， ... ， p (/? ) and the scalars 入】， … ， A p . We apply the 
algebra of the preceding section to an estimate of the covariance matrix. 

Theorem 11.3.1. Let x^...,x N be N (>p) observations from 
w here 'L is a matrix with p different characteristic roots. Then a set of maximum 


(32) 


°-n 0 

0 o- 22 


丨 0 0 

then 屯 is the matrix of correlations. 


Theorem 11.2.1 









































accurate to the sixth place) / ( =0.487875, The normalized eighth iterated 
vector is our estimate of p ⑴， namely, 


( 3 ) 


f 0.6867244 、 
A(1)= 0.3053463 
~ 0,6236628 
, 0.214 983 7, 


This vector agrees with the normalized seventh iterate to about one unit in 
the sixth place. It should be pointed out that l x and 办 (I) have to be calculated 
more accurately than l 2 and b {2) , and so forth. The trace of 5 is 0.624 824, 
which is the sum of the roots. Thus /, is more than three times the sum of 


the other roots. 

We next compute 


(4) S 2 ^=S-l l b (l) b (l), 


' 0.0363559 
-0.0171179 
-0.0260502 
L —0.0162472 


-0.0171179 
0.0529813 
一 0.0102546 
0.0091777 


-0.0260502 
— 0.0102546 
0,0310544 
0.0076890 


-0.0162472' 
0.0091777 
0.0076890 
0.0165574, 


and iterate z u) = S 2 z u ~ x \ using z t0), =(0,1 ， 0,0\ (In the actual computation 
S 2 was multiplied by 10 and the first row and column were multiplied by - 1.) 

>t proceed as rapidly; as will be seen, the 
1.32. On the last iteration, the ratios agree 
gnificant figure. We obtain / 2 = 0.0723828 


The results may be summarized as follows: 
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(6) 

= (0,4879,0.0724,0.0548,0.0098), 




0.6867 

-0.6690 

-0.2651 

0.1023 

( 7 ) 

H 

0.3053 

0.6237 

0.5675 

0.3433 

-0.7296 

0.6272 

— 0.2289 
-0.3160 



L 0.2150 

0.3353 

0.0637 

0.9150 


The sum of the four roots is = 0.6249, compared with the trace of the 
sample covariance matrix, tr5 = 0.624 824. The first accounts for 78% of the 
total variance in the four measurements; the last accounts for a little more 
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PRINCIPAL COMPONENTS 


the limiting distribution MO, 2A?). The covariances of g (1 >, … ， g (p) in the 
limiting distribution are 

⑴ •罐卜矣為『俨 , ’ 

k^i 

(2) ssf^{g (i) , g u) ) = - ( A 二 了 ?，⑴ , ’ ⑷. 

See Theorem 13.5.1. • 

In making inferences about a single ordered root, one treats as approxi- 
mately normal with mean A, and variance 2 A?/”. Smce 々 is a consistent 
estimate of A„ the limiting distribution of 



is M0,1). A two-tailed test of the hypothesis A, = A? has the (asymptotic) 
acceptance region 

(4) -z ㈤ ⑷’ 

where the value of the AT(0,1) distribution beyond zU) is The interval 
⑷ can be inverted to give a confidence interval for A ; with confidence 1 - 

1 . 〆、〆 h _ 

(5) l + ^2/nz(s) T^\f%^z(e) • 

Note that the confidence coefficient should be taken large enough so 
JyTtz^sXl. Alternatively, one can use the fact that the limiting distr.bu- 
tion of ^Tdog/. -log A,) is MO, 2) by Theorem 4.2.3. (0 

Inference about components of a vector p<'> can be based on treating b 
as being approximately normal with mean P (,) and (singular) covariance 
matrix \/n times (1). 

11.6.2. Confidence Region for a Characteristic Vector 




i (/) = n(b^ - p(-))^p*A*- 2 P- r {b {i) - 














11.7. TESTING HYPOTHESES ABOUT 
ROOTS OF A COVARIANCE MATRIX 



principal components if 
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The covariance of g, and gj is 

( 3 ) = -(1 + k) - AtA; p.p；.. 

(入， -〜） 

For inference about a single ordered root the limiting standard normal 
distribution of }fN (/- + 3k) l { ) can be used. 

For inference about a single vector the right-hand side of (11) in Section 
11.6.2 can be used with 5 replaced by (1 + k)S and 5 _1 by S~ l /{\ + k). 

It is shown in Section 13.7.1 that the limiting distribution of the logarithm 
of the likelihood ratio criterion for testing the equality of the q=p-m 
smallest roots is the distribution of (1 + k)x^ - 

11.8.2. Elliptically Contoured Matrix Distributions 

Suppose the density of (x lt ...,x N )is 

where A ~ (X ~ e' N x)(X — b' n x)' = nS and n = N — i. Thus x and A are a 
sufficient set of statistics. 

Now consider A^YY , having the density g(tr A). Let A=BLB\ where L 
is diagonal with diagonal elements l x > ••- > l p and B is orthogonal with 
> 0. Then L and B are independent; the roots have the density 

(18) of Section 13.7，and the matrix B has the conditional Haar invariant 
distribution. 


PROBLEMS 


11.1. (Sec. 11.2) 


Prove that the characteristic vectors of 


are 


1/V2\ 

1/V2J 


and 


l/v^ \ 

-1/ 对 


corresponding to roots 1 + " and 1 - p. 

1L2. (Sec. 11.2) Verify that the proof of Theorem 11.2.1 yields a proof of Theorem 
A.2.1 of the Appendix for any real symmetric matrix. 
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( 23 ) 0=e j y^ f X 22 y ij) =e j . 

Thus (20) and (2!) are simply (11) and (12) or alternatively (13). We therefore 
take the largest A,-, say, A (r+1 >, such that there is a solution to (13) satisfying 
0) ，（ 4) ， (15), and (17) for / = 1， . _. ， r. Let this solution be a (r+1) , 7 (r+1) and 
: let U r+l and V r+l = 7 ^ +1 )^ (2 >. ， 

This procedure is continued step by step as long as successive solutions 























gives the desired result with sums starting with i = 2 . 

We can derive a single matrix equation for a or 7 . If we multiply (11) by A 
and ( 12 ) by we have 

(48) A2 12 7 = 入％#， 

(49) 2 2 _ 2 % ia = A 7 . 

Substitution from (49) into (48) gives 

(50) U 2 _ 2 % ia = A 2 2 lia 
or 

(51) (S 12 S^S 21 -A 2 S n )a = 0. 

The quantities Aj,satisfy 

(52) 


and a ( 1 ) ,...,a (/?l) satisfy (51) for A 2 = A?,..., respectively. The similar 
























' 12 (V(»)=V? n (V ⑴)， 

t 21 ( s ，.)) y 22 ( s 2C 。'))， 

,« 0) ) = 1, {S 2 c^)'R 22 {S 2 c^)^\. 

elopments a geometric interpretation. The rows of 
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is j 2 with p 、 p 2 degrees of freedom. (See Section 9.4.) Note that it is 
approximately 

( 3 ) N i,rf = NtxA^A, 2 A^A 2 „ 

i= I 

which is N times Nagao’s criterion [(2) of Section 9.5]. 


咖 •■训 旧狀 _4 L •从 遷挪灿 I 川⑽變■从 III 】 



丨 = 0, and so forth. In this procedure one 


test the hypothesis p』= 0. The relevant asymptotic distribution will' be 
discussed in Section 12.4.2. 


12.4.2. Distributions of Canonical Correlations 


that S 12 ^ 0, that is, all the population correlations are 0. The density wh 
some population correlations are different from 0 has been given by Const; 
tine (1963) in terms of a hypergeometric function of two matrix argumer 
The large-sample theory is more manageable. Suppose the first k cano 




T— 


^-» ； exp(-lE^ +1 2,.) 

- ft n (n). 

/ = *+ 1 i.j--=k+\ 

(11) of Section 13.3 of the characteristic roots of a 
ix with distribution W( 1 ，一 — k). Note that the nor- 




section 




ao [(1Q52), p. 245] 

















■ 151.12,183.84,149.24 )， 


95.2933 52.8683 

52.8683 54.3600 

69.6617 51.3117 

,46.1117 35.0533 

The matrix of correlations is 


69.6617 46.1117] 

51.3117 35.0533 

100.8067 56.5400 

56.5400 45.0233 ^ 


1.0000 0.7346 

0.7346 1.0000 


0.7108 0.7040 

0.6932 _0.7086_ 
I.000d"~d.8392 


\ 0.7040 0.7086 | 0.8392 1.0000 j 

All of the correlations are about 0.7 except for the correlation between the 
two measurements on second sons. In particular, R l2 is nearly of rank one, 
and hence the second canonical will be near zero. 

We compute 


^ 22^21 =( 
2^22^21 ^ ( 


(0.405 769 0.3332051 

\ 0.363480 0.428976 J 


0.544 311 0.538841 1 

0.538841 0.534 950, 


The determinantal equation is 

0 = 10.544311 - 1.00001/ 0.538841 - 0.73461/1 
U_ |o.538841 - 0.7346^ 0.534 950 - 1.0000i/| 

= 0.4603631/ 2 - 0.287596i/+ 0.000 830. 

The roots are 0.621816 and 0.002900; thus l, = 0.788553 and / 2 = 0.053852- 
Corresponding to these roots are the vectors 


Rao's computations arc in error: his last ''difference'' is incorrect. 


We apply {l/l i )R^iR ll to 5^^ to obtain 


(2)= 1.767 281 ， 

\ —1.757288, 


10.0402 0 1 

0 6.70991 


We check these computations by calculating 


( 10 ) TRn l R n (S 2 c^ 






to the smaller root means that the iteration decreases the accuracy instead of 
increasing it. Our final results are 


( 11 ) 


⑴ (2) 

/ t ■- 0.789, 0.054 ， 

(0= / 0.0566 \ ( 0.1400 \ 

_ 1 0.0707 j v I -0.1870 j 5 

(i)= ( 0.0502 \ ( 0.1760 \ 

~ 10.0802 1—0.2619 厂 


n each set is approximately proportional to the difference of the two 
tandardized measurements. 






















Consider a linear combination of X^\ say = Then has 

variance a^a and expected value 

(5) = + 


















-无 ⑴-㊁ (岭 ) _叫1 一无 ⑴一自 «) 一叫]’ 

^ 22 ^ 21 ) = $11 - ^ 12 ^^ 2 * 1 ^21 » 


re defined as before. (It is convenient to divide by 

i N\ the latter would yield maximum likelihood 


f ⑺ and ( 8 ) are 


12^22*^21 _ ^(^11 - 〜巧 1 〜)]吞， 

S 22 P f -k^\ 

12^22 ^21 - 女 ($11 _ *^12 ^22'^21 )1 - 

of ( 20 ) estimate the roots 〜之…之 & of ( 8 )，and 
»ns a (1) ,...,a (Pl) of(19), normalized by 5 (,>/ 企 5 (,) = 1 ， 
Then c 0) = (1/ ^k^)p l a (j) estimates 7 0 ), and n -= 
i sample canonical variates are a U), x { ^ and c (;), (^ 


can 










approximately the /-distribution with (/? 】 一女 Xp2 一女 ） degrees of 
jdom. 

Hie determination of the rank as any number between 0 and p x can be 
le as in Section 12.4. 

S.5. Linear Functional Relationships 

- study of Section 12.6 can be carried out in other terms. For example, the 









and the maximum likelihood estimators of v a are 


(33) 匕 =仓6^> (九一30， a=l ， ... ， n. 

The estimator (32) can be multiplied by any nonsingular qXq matrix on the 
left to obtain another. For a fuller discussion, see Anderson (1984a) and 
Kendall and Stuart (1973). 


12.7. REDUCED RANK REGRESSION 


that Z a is normally distributed. Izenman (1975) suggested the term reduced 
rank regression. 


12.8. SIMULTANEOUS EQUATIONS MODELS 
12.8.1. The Model 

Inference for structural equation models in econometrics is related to canoni¬ 
cal correlations. The general model is 

(1) By t + Tz t = u n 卜 

where B is G xG and T is G XK. Here y t is composed of G jointly 
dependent variables (endogenous), z t is composed of K predetermined 
variables (exogenous and lagged dependent) which are treated as “indepen¬ 

dent” variables ， and Ui consists of G unobservable random variables with 

(2) = Su t u\ = S. 

We require B to be nonsingular. This model was initiated by Haavelmo 
(1944) and was developed by Koopmans, Marschak ， Hurwicz, Anderson, 
Rubin, Leipnik, et al., 1944-1954, at the Cowles Commission for Research in 

Economics. Each component equation represents the behavior of some group 

(such as consumers or producers) and has economic meaning. 

The set of structural equations (1) can be solved for y t (because B is 
nonsingular )： 

(3) 

where 

(4) 

with 


yi = nz, + v n 

n= -B l T, v^B^u, 

















multivariate regressio 


12.8.2. Identification by Specified Zerc 






































Dr has been called the least vari- 


the coefficients of this single equation, the estimator is maximum likelihood 
and is known as the limited-information maximum likelihood (LIML) estimator 
[Anderson and Rubin (1949)]. 

The algebra of minimizing (25) is to find the smallest root, say v y of 

(26) |p 12 s 22 . 1 p; 2 -An 11 | = o 

and the corresponding vector satisfying 

(27) P 12 S 221 尸 ; Xj 

The vector is normalized according to some rule. A frequently used rule is to 












on (1951b) ，（ 1976), (1984a)]. 


>f vT(p*-p*) defined by (28) and 




estimator and the 15>LS estimator is O p {l/T). We have 

(53) Pt,ML - P* TC LS 

=(p^.^y 1 pf 2 s 22 . iP \ 2 

~( P *2 S 22-1 P *2 ~ *^11) ( P *2 S 22-lP'l2 - ^W (1) ) 

= [(pfiSn.^y 1 - - 

+ ( 尸 ； — 

+ v(p* 1 s 12 . i p* 1 - v a u y^ m 

= 0»=O p ( + ). 

Consider 

(54) p' n +Pti^=P[^ =A^., £ Z?-^=A- n \ E 

，=1 / =1 

where zf ^=z^-A 2 A^z^\ Thus A^ 12 + P ,*2 3*) = ^； 2 P = 0 and 

(55) ^(p ； 2 +p^p*) (/ 12 +p- P *y= 奸 ; 2 P ( 尸 ; 2 P)'= a n A- 2 l,. 

Note that - 3* = -iP^i^P^P^-yPh^ and (H',0)y, + 

(r / ,o)z r -M lr 

Theorem 12.8.1. Under the conditions of Theorem 8.11.1 

(56) ^(P*L,ML - P” 〜 [0, ffll (n 12 趄爲厂 1 ]. 

Proof. The theorem follows from (55) ， S 22 ] — S^]，and P n 4 n i2 . 

■ 

Because of the correspondence between the LIML estimator and the 
maximum likelihood estimator for the linear functional relationship as out- 
lined in Section 12.7.5, this asymptotic theory can be translated for latter 
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CANONICAL CORRELATIONS AND CANONICAL VARIABLES 
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The Distributions of Characteristic 
Roots and Vectors 


13.1. INTRODUCTION 


13.2 THE CASE OF TWO WISHART MATRICES 529 

principal components are equal. Some limiting distributions are obtained for 
slliptically contoured distributions. 

13.2. THE CASE OF TWO WISHART MATRICES 
13.2.1. The Transformation 

Let us consider A* and B* (pXp) distributed independently according To 
and respectively (m，n >p). We shall call the roots of 

( 1 ) =0 

the. rhnrnrtpri^tir rnnt.^ nf A* in the metric of B* and the vectors satisfying 















































536 THE DISTRIBUTIONS OF CHARACTERISTIC ROOTS AND VECTORS 

for 0 … </j < 1, where 

(4?) T p an)T p am)T p a P )- 

The density of / ; is obtained from (46) by letting 


(48) 力 = 7^1; 
we have 

df t 1 
成 “ + ” 2 ’ 

(49) ’ 广 ",. + i)(/; + i)’ 

Thus the density of /, is 

(so) c 2 n/p-- ,) n(/ i +i) _>,+ " ) n^-^) 


are the roots of 


(52) WU f -fl p \ =0. 

We shall show that the nonzero roots f、> … >f m (these roots being distinct 
with probability 1) are the roots of 

( 53 ) = 0 . 

For each root / ¥=0 of (52) there is a vector x satisfying 

(54) 

Multiplication by t/' on the left gives 

(55) ^ = U\VU'-fI p )x 

Thus U f x is a characteristic vector of UU' and / is the corresponding root. 

It was shown in Section 8.4 that the density of U = Ul is (for I p - UU 
positive definite or I m - U * Ul positive definite) 

(56) K\l p -UU'f n - p -' ) -K\l p .-U*Uif n， - p "'\ 

where p* = m, n* -p* - and m* = p. Thus j\ ，…、 f m must be 

distributed according to (46) with p replaced by m, m by p, and n by 
n + m -p, that is, 

( 5 ) r m (»r m lKm + n-p)]r m ap) 

£■=1 L *<) 

Theorem 13.2.3. If A is distributed as W.W；, where the m columns of W, 
are independent’ each distribuied according to MO, X), m <p，and B is indepen¬ 
dently distributed according to n >p, then the density of the nonzero 

roots of \A -f(A + B)| =0 is given by (57). 

These distributions of roots were found independently and at about the 
same time by Fisher (1939)，Girshick (1939)，Hsu (1939a)，Mood (1951), and 
Roy (1939). The development of the Jacobian in Section 13.2.2 is due mainly 
to Hsu [as reported by Deemer and Olkin (1951)]. 







.ans^ation be /(L,C). Then the joint density of L and 
^(/ 1 ,...,/ /7 )/(L,C). To prove the theorem we must show that 

(5) / "• ff(L，C)dc r " dc p(p ^ )/2 - . 


We show this by taking a special case where B = UU f and Uipxm, 
has the density 




I u __( … 


13.3 THE CASE OF ONE NONSINGULAR WISHART MATRIX 
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"^hen by Lemma 13.3.1, which will be stated below, B has the density 

(7) 觀 S ； 卜 - … 

=g*(iu---j P y 

The joint density of L and C is /(L, C)g*(l v ... J p ). In the preceding section 
we proved that the marginal density of L is (50). Thus 

(8) f … fg*(! l ， ... ， l p )f[L ， C)dC = g*(J …… ff(L ， C)dC 




r,(ip) 




This proves (5) and hence the theorem. ■ 

The statement above (7) is based on the following lemma: 

Lemma 13.3.1. If the density of Y (p xm) is fiYY% then the density of 
B = YY，is 


⑼ 


Y p{\ m ) 


The proof of this, like that of Theorem 13.3.1, depends on exhibiting a 
special case; let f(YY') = {2 tt)~ e~ ^ YY \ then ⑼ is w(B\I,m). 

Now let us find the density of the roots of (1). The density of A is 

• = n/L,/>^- |) exp(-lE/L 1 / ; ) 

2>r p (b) 2->r„(ln) ' 

Thus by the theorem we obtain as the density of the roots of A 
exp (- fE/L./pn,^ ,(/,-/；) 


( 11 ) 


2>r»r p ( 如） 



















Theorem 13.3.5. Let B = B' have the density 


(25) 7I --/>(p + l)/4 2 -^ e -ilrK\ 

Then the characteristic roots l x > ••- > l p of B have the density 

(26) 办 - i )/4 r - 1( b )exp ( —4 —/)， 

\ / = 1 / i<) 

and the matrix Y of the normalized characteristic vectors (y u >0) is indepen¬ 
dently distributed according to the conditional Haar invariant distribution. 

Proof. Since the characteristic roots of B 2 are and trB 2 = E/, 2 , 

the theorem follows directly. ■ 

Corollary 13.3.2. Let nS be distributed according to W(I, n\ and define the 
diagonal matrix L and B by 5 = C'LC, C / C = /, l x > > l p , and c n >0, 

/ Then the density of the limiting distribution of ^n(L-l) = D 

diagonal is (26) with replaced by and the matrix C is independently 
distributed according to the conditional Haar measure. 

Proof. The density of the limiting distribution of }fn{S — I) is (25), and the 
diagonal elements of D are the characteristic roots of }fn{S —I) and the 
columns of C are the characteristic vectors. ■ 


13.4. CANONICAL CORRELATIONS 


The sample canonical correlations were shown in Section 12.3 to be the 
square roots of the roots of 


⑴ \A 12 A^Au-fAn\ = 0, 

where 


^ (2) £ (野) -奸 ))(处>-扒％ 

^ a =] 


and the distribution of 

( 3 ) 


X = 


A ： ⑴） 

X 2) l 


















For the moment assume {Y^} t 


whicn is tne density oi tne square oi me sample muiupie curreiauun l-ucui- 
cient between X (1) {p { = 1) and X {2) (p 2 ^P ~ ^)- 


13.5. ASYMPTOTIC DISTRIBUTIONS IN THE CASE OF 
ONE WISHART MATRIX 




jisiriDuiea according to wy2, n ,n -p 2 ) and 
: rms of Q the equation (1) defining / is 

_ /(^lI-2 + fi)l = 0 - 
















wnere me suomairices 
and 1/i/n . The ortho: 

斷 (Y I 





uistriDuuon oi u\ ana aenn 
the density (25) of Section 13.3. 




)1 + . 


ffnff^um^KKfewtR 


Tjmmsmmmjm 







( The details of verifying that U(n) and 

(19) (D 2 {n),Y n (n)) =f„(U(nn)) 

G •> 

f satisfy the conditions of the theorem have been given by Anderson (1963a). 

L 13.6. ASYMPTOTIC DISTRIBUTIONS IN THE CASE OF 
I TWO WISHART MATRICES 

13.6.1. All Population Roots Different 

In Section 13.2 we studied the distributions of the roots /, >/ 2 ^ •■- >l p of 

(i) in | = 0 

and the vectors satisfying 

(2) (5*-/r*)x*=0 















The diagonal elements of (18) and the components 


( 20 ) 

( 21 ) 

( 22 ) 


= 2 *;/. 

U ii-K v u = di， 

U u- v ii x i = ( x i- k i) h in 


The limiting distribution of H and D is normal wuu means u. iiie pans 
(办, 7 , hji) of off-diagonal elements of H are independent with variances 


(23) 


and covariances 




'(入 t +〜） 

刀( 入 / -入 y ) 2 ， 




(24) 


你 （ h U ， h ji)= - 


入/入+刀） 
”(入/-入 /) 2 




The pairs of diagonal elements of D and H are independent with 

variances (5)， 


(25) 


and covariance 


(26) 


■^nhu) = i 


^€(d h h u ) = - A,-. 


The diagonal elements of D and H are independent of the off-diagonal 
elements of H, 

That the limiting distribution of D and H is normal is justified by 
Theorem 4.2.3. 5 and T are polynomials in L and G, and their derivatives 

are polynomials and hence continuous. Since the equations (12) with auxiliary 

conditions can be solved uniquely for L and G, the inverse function is also 

continuously differentiable at L = A and G = I. By Theorem 4.2.3, 
D (L - \ ) and H= ^fn{G —I) have a limiting normal distribution. In 
turn, AT=G _1 is continuously differentiable at G =/, and Z= 4n{X~I) 
= i/n(G -1 -/) has the limiting distribution of -H. (Expand 4n{[I + 
(1/ yfn)H]~^ -/}.) Since G A I ， and x u > 0, i = 1 ， … ，p with proba¬ 
bility approaching 1. Then Z* = )fn(X* - T) has the limiting distribution ol 
rz. (Since X-^I, we have 尤* =r^4r and x u > 0, i=l, …， p, with 
probability approaching 1.) The asymptotic variances and covariances (6) to 










13.7. ASYMPTOTIC DISTRIBUTION IN A REGRESSION MODEL 
13.7.1. Both Sets of Variates Stochastic 

The sample canonical correlations / and vectors 

y v ...,y p2 are defined in Section 12.3. The set t ，…， 7 P2 and / 1 ,...,/ p2 
defined by 


⑴ 


5 2i 5 n l5 i27 7 ， S 22 7 = 1. 




A diagonal element of the upper submatrix equation of (10) is 


(12) A 卜士 E [(w ； a^a - K) - - - ! )] +0 /.( 1 )- 


The right-hand side of (12) is the expansion of the sample correlation 
coefficient of u ia and u ia . See Section 4.2.3. The limiting distribution of A* is 
M0,(1-A?) 2 ]. 

The (/,;')th component of Hf, in (10) is 


(13) 

(〜 2 _ = 去亡 (V,.o W > + k i U ic V ia - '.V,.。、_ A J V i a U ja ) + 〜⑴. 

i ^ j ^ j = 1 — i P\- 


The asymptotic covariance of (\j - and (Af - Ay)/!*/ is 

[( 1 - 入 ; 2 )( 入 ? + 入卜 2 判 （1 - 入 ? )(1 - 入 J)( 入？ + 入 ” 

(14) |_(1_ 入? )(1- AJ ) (入? + 入?） （1 -入? )( 入? + 入卜2 入⑹ 


The pair (hfj, hp is uncorrelated with other pairs. 、 

Suppose p l = p 2 - Then t* = THf x =TH' ¥ . Let r = ( 7 1； ..., 7 Pi ), f = 
( 今】， … ，今 p )_ Then 7 ； = where h* jy i is obtained from (13) and 

h% from (9). We obtain 


(15) 


n^(yj- 7))(7 ； -7；y = \yn) + (1-A ; 2 ) L 

女 =1 


+ A y - A / 

(入卜入以 


(16) /z^(7.-7 ; )(7 / -7/V : 


,-A ; 2 )(1-A?)(Aj + A?) 


7/7；^ 


Anderson (1999a), has also given the asymptotic covariances of d ; and of 
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From (39) and (9) we obtain 




We can compare 0, with the maximum likelihood estimator unrestricted by 

a rank condition © - S uv Syy, Then 

(41) = yfn (S - S) = {S uv - 

[ 5 ^ 5*^1 

= S^ K +O p (l) = 2l *22 + 0 P (”， 

\^wv \ 

since S VV ^I. The effect of the rank restriction is to replace the lower 
right-hand submatrix of by 0 (the parameter value). 一 
Since S* wy = (l/^ )E：.we have vecS* wv = (\/H 
Because V a and W a are independent, 

(42) “ecSKvecS^)’ 

= SW 1 ® (/-A 2 ) = diag(/- A 2 ,A 2 ). 

where A = diag(A 1 ； 0) and I - A 2 = diag(/- A 2 „/)• On the other hand 


e ⑴ ’’m+w 1 ) 




匕 ⑴ ® W a 

e ® ( T ) +° p ⑴’ 


where V a = {V^',V^y and % = «) ，， f)' Then 


■ vec 的 (vec 知 ) , 




f l k - K\ 0 


=diag(/ p , - A 2 ， •.. ， /p, - A 2 ， 4 - M ， 0 ，…. ’r M ， 0 )• 
where there are k blocks of - A 2 and Pl -k blocks of diag(/ t - A:, ， 0). 
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THE DISTRIBUTIONS OF CHARACTERISTIC ROOTS AND VECTORS 


In the original coordinate system 

(45) vec ( 式 -p)= vec[( A’)—( 化 - 0) 叫 

= [r ® (A’ ） _1 ]vec ( 合 - 0 ) 

=[(r l ,r 2 )®2 zz (A 1 ,A 2 )(/-A 2 ) _1 ]vec(0-®). 

From (44) and (45) we obtain 

(46) S' vec n (B,-p)[vec(B t -p)]' 

+ [r 2 q®S 22 A 1 (4-A 2 1 ) _1 A' 1 X zz l 

= [r 1 r;®s zz ] + [r 2 r;®x zz A 1 (/,-A 2 1 )" 1 A , 1 2 zz ] 

=® 2 ZZ - (r 2 r2 ® S 22 A 2 A' 2 2 zz ) • 

If we define ft = = S zz A! A^/- A^) -1 and FI = 厂 ， then P = X1IT_ 

We have 

(47) n(ft'2^a)" 1 a , =5 ； Z z-2 ZZ a 2 A’ 2 S zz ， 

(48) iKnu-in^Lr—s；^-!^. 

Thus (46) can be written 

(49) / vec Bt (vec Bf) 1 S zz - [^xx ~ 'n'] 

Theorem 13.7.1. Let (X ⑴ ， ， X (2), )' ， a=l ， ... ， n, be observations on the 
random vector x a with mean 0 and covariance matrix X. Let P = S 12 S 2 2 * 
the columns of f \ satisfy (1) and y n > 0. Suppose th 气 义⑴；网(二 = 气上’ 
independent of X {2 \ Then the limiting distribution of vec Bf = i/n vec(^ - p), 

a 么 . ... yv 1 •__ (AC\ //1n\ 


(46) or (49). 


13.8. ELLIPTICALLY CONTOURED DISTRIBUTIONS 
13.8.1. Observations Elliptically Contoured 

Let x v ...,x N be N observations on a random vector X with density 

(1) [(: c-va-V-v )]， 

where 市 is a positive definite matrix, R 2 = (x- vY^Kx - v\ and 
£R 2 < oo. Define K^p SR A /[{ £R 2 ) 2 ( p + 2)]~ 1. Then SX=v^fi and 
i(X-vXX-v) r = {^R 2 /p)^--^^. Define x and S as the sample mean 
and covariance matrix. Define the orthogonal matrices P and B and the 
diagonal matrices A and L by 

(2) S = PAP ，， S=BLB\ 

入！ > … > A p ，/! > ，_， >l p ，（ 3^0, b n > 0, /= 1,As in Section 13.5.1, 
define T= P'SP = YLY\ where Y= P'B is orthogonal and y n > 0. Then 

^r=psp- a. 

The limiting covariances of '/N vec(S — X) and ^/N vec(r - A) are 

(3) lim A^^ , vec(5-5 ： )[vec(S-5：)] / 

W —CO 

= («+ l)(I p ^+K pp )(l ® 5：) + K vec S (vec S )，， 

(4) lim NS vec(r- A)[vec(T- A)]’ 

A^-»oo 

= (K+l)(I p ,+K pp ) + KwecI p (vec I p ) . 



THE DISTRIBUTIONS OF CHARACTERISTIC ROOTS AND VECTORS 



叫厂 人 k s kl ) 

=(k + l)(A,.A y 5 it 5 y , + A,.A t 5 i ； 5 yt ) + kA,A 4 S, 7 





roots of M-/04+5)| =0 be 力 > … >/ p ，and let f = diag(/„... 
Define E(pXp)byA+B = E'E, and A =E'FE, and e,, >0, i=l,...,p. 

Theorem 13.8.3. The matrices E and F are independent. The density of F is 


(19) 




the density of E is 


1 ) fi (i -/；)) n a. -，); 

/=i /=i i<j }, 


2 P Y p (\p)TT^ n+m -P' 1 

2^ m+n - 2) ^ 2 Y p [{{m+n)\ 


lE'El^^-p^itrE'E). 


( 20 ) 






Factor Analysis 


14.1. INTRODUCTION 

Factor analysis is based on a model in which the observed vector is parti¬ 
tioned into an unobserved systematic part and an unobserved error part. The 
components of the error vector are considered as uncorrelated or indepen¬ 
dent, while the systematic part is taken as a linear combination of a relatively 
small number of unobserved factor variables. The analysis separates the 
effects of the factors, which are of basic interest, from the errors. From 
another point of view the analysis gives a description or explanation of the 
interdependence of a set of variables in terms of the factors without regard to 
the observed variability. This approach is to be compared with principal 
component analysis, which describes or “explains” the vanability observed. 
Factor analysis was developed originally for the analysis of scores on mental 
tests; however, the methods are useful in a much wider range of situations, 
for example, analyzing sets of tests of attitudes, sets of physical measure¬ 
ments, and sets of economic quantities. When a battery of tests is given to a 
group of individuals, it is observed that the score of an individual on a given 
test is more related to his scores on other tests than to the scores of other 
individuals on the other tests; that is，usually the scores for any particular 
individual are interrelated to some degree. This interrelation is “explained” 
by considering a test score of an individual as made up of a part which is 
peculiar to this particular test (called error) and a part which is a function of 
more fundamental quantities called scores of primary abilities or factor scores. 
Since they enter several test scores, it is their effect that connects the various 
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FACTOR ANALYSIS 


crucial assumption is that the components of U are uncorrelated. Our 
point is that the errors of observation and the sDecific factors are hv 


mMiMi 讕 







Simple Structure 


.Identification 
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il (1950) modified Thurstone’s condi- 
>n that satisfies the conditions，thus 















































14.3. MAXIMUM LIKELIHOOD ESTIMATORS FOR RANDOM 
ORTHOGONAL FACTORS 

143.1. Maximum Likelihood Estimators 

In this section we find the maximum likelihood estimators of the parameters 



















The 
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(24) using (2i 

E [yf~\ 

■m + \ 











































the first characteristic vector 
vector, apply the same procec 
The centroid method can t 




























ral rotation can be approximated manually by a sequence 
rotations. 






















Amemiya (1988a) have derived the asymptotic distribution 
under general conditions. Normality of the observations is 
also Anderson and Amemiya (1988b). 


ION OF FACTOR SCORES 


’ interest to estimate the factor scores of the individuals in 


itudied. In the model with nonstochastic factors the factor 


ntal parameters that characterize the individuals. As we 
n 14.4)，the maximum likelihood estimators of the parame- 
，...，/") do not exist. We shall therefore study the estima- 
scores on the basis that the structural parameters (H 蚪） 































When f a is considered as an incidental parameter, x a ~ is an observa¬ 
tion from a distribution with mean Af a and covariance matrix 平 . The 
weighted least squares estimator of f a is 

( 1 ) 

where r = A^ _1 A (not necessarily diagonal). This estimator is unbiased 
and its covariance matrix is 

⑺ = ' 

by the usual generalized least squares theory [Bartlett (1937b),(1938)]. It is 
the minimum variance unbiased linear estimator of f a . If x a is normal, the 
estimator is also maximum likelihood. 

When f a is considered random [Thomson (1951)]，we suppose X a and f a 


I 


—i — 


及 (/:!/„) = (，+ rr 1 r /„， 
^(f ： \f a ) = (i + ry 1 r(i + r) 




(9) 及[( 亡 — A ) (力 -九 )1 卜 （/ + n _1 ( r +/』)(/ + r ) 

(10) = ( /+r )"'- 


This last matrix, describing the mean squared error, is smaller 


estimator and is appropriate when f a is treated as random. 


PROBLEMS 




(b) Show that when the factor scores are included as data the sufficient set of 
statistics is x,/, C xx = C, 

c "= 去 E 丨(九 - 

tc) Show that the conditional expectations of the covariances in (b) given 
1 = (1 ，…， k )， 八，中 ， and 中 are 

Ct x = ^(CjX,\,^^)=C xx , 

C ; 广 #(C"U ， /V ， 0, 屯 ）= C,, ( 中 + AOA，） _l 八 O ， 

Cf f = = + A<KA , )" 1 C XJt (^ +AOA 7 )' 1 AO 

+ 0 — OyV (中 + AO. 

(d) Show that the maximum likelihood estimators of A and 中 given O = / are 
A = C^Cff 1 , 


CHAPTER 15 


Patterns of Dependence; 
Graphical Models 


15.1. INTRODUCTION 

An emphasis in multivariate statistical analysis is that several measurements 










ly models which involve several kinds of 
tterns of dependence. 

: s is a visual diagram in which observable 

ts {vertices or nodes) connected by edges and 
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Theorem 15.2.2. The coi 


( 6 ) 


:) =^f(x\z)g(y\z)h(z) 

= k(x\y)l{z\y)fn(y). 

: zb)w (>)， ⑻ implies /(x|z) = k(x\y\ which 


which is the density generating 
form in (8)，implying (7). 1 

Corollary 15.2.1. The relatic 

( 10 ) XMY\a 


■ 


Markov. 


Proof. Lst K\cl(w) = t；i U … Then 

(15) u _ll u 1 |bd(w) Uu 2 U *** U u n , u iL y 2 |bd(w) U h U u 3 U … U u "， 
which by Corollary 15.2.1 implies 

(16) w iL i；j u u 2 |bd(w) U u 3 U … U 
Further, (16) and 

(17) m iL u 3 |bd(w) U 4 U U t; 4 U … U 
imply 

(18) w iL w u u 2 u t) 3 |bd(t/) u u … u 
This procedure leads to 

(19) w _LL 1 ；丨 U … U 〜 |bd(t/), ■ 


A third notion of Markov, namely, global, requires some definitions. 


















604 


PATTERNS OF DEPENDENCE ； GRAPHICAL MODELS 


difficulty grade recommendation 


where 

( 34 ) K x s)= jk{x B ^ s )dx B . m 

In turn / /4 u5 (j ： /4u5 ) and can be factorized, leading to (28). 


153. DIRECTED GRAPHS 

We now include relations with a direction; the measurement represented by 
one vertex u may precede the measurement represented by another vertex v. 
In the graph this directed edse is displayed as an arrow nointins from u to tv 






























jritzen (1996)] and Figure 15.5. 
properties as specified by Lauritzen and 
(1990): 


=l ， ... ， r，is locally Markov with respect to 
£^(G); that is, 

，- o-end w (r)\pa !j ,(T). 









suppose that the parent distribution is N( 
and 5 = (l/n)E n a=1 ^ a 4. Tt e likelihood : 

( 2 ) (27ry pn/2 \A\ pn/2 e-^ ,lX 

= exp|-^( A) - ^ 

where A = (A, v ) = S' 1 , T= (t^) = E^ = 



The likelihood is in the exponential 
and statistic T. The maximum likelihood 











(13) and (14) 

ItPR x +tr/ p . ■ 
n to the nroblem of finding a matrix O. 



im 
























■ 醒 mimJ MB 


The Markov property (C3*) specifies X u M for M Gt/cK(r) 

and u e pa B (U)U nb G (U). The property is a restriction on the regression of 

° n ^pa*(T>* 
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ties 

(15) {A'Y=A, 

(16) {A+BY=A'+B\ 

(17) 

again with the restriction (which is understood throughout this book) that at 
least one side is meaningful. 

a vprtnr r with m cnmnonents can be treated as a matrix with m rows 








■ 











atrix A is the determinant |yl|, defined 

/l " … .. V ria ",.， 

/= I 

1 permutations (j ly ...J p ) of the set of 
the number of transpositions required 
.transposition consists of interchanging 




BBiBIIIM— !WffiHjBBBngffiWWSBWnU 训 




















Proof. From Corollary A.1.6, A = G'^GO -1 , where G is lower triangular. 
Then T=G~ l is lower triangular. ■ 

In effect this theorem was proved in Section 7.2 for A = W'. 

A.2. CHARACTERISTIC ROOTS AND VECTORS 

The characterbitir. rants of a sauare matrix B are defined as the roots of the 















BC = 


圍 


■MM 


k - 入 


If \ is a characteristic root of B, then a vector not identically 0 
satisfying 

(6) (5 = A./)x i = 0 

is called a characteristic vector (or eigenvector) of the matrix B corresponding 
to the characteristic root 入 £ . Any scalar multiple of is also a characteristic 
vector. When B is symmetric, x ； (5-A / /) = 0. If the roots are distinct, 

































(18) 


^\B\ _ R +R 

Since \A\ = |B| and B, y = B yl =/l, 7 =A j{ , (17) follows. ■ 

Theorem A.4.3. 

(19) -£(x'Ax)=2Ax, 

where d/dx denotes taking partial derivatives with respect to each component of x 
and arranging the partial derivatives in a column. 


Proof. Let /i be a column vector of as many components as x. Then 






(26) 




Definition A.4.2. If the pXm matrix A = (a ]i ... J a m \ then vec ^ = 

(<，...». 

Some properties of the vec operator [e.g.，Magnus (1988)] are 

( 27 ) vec ABC= (C f 0A)vecB, 

( 28 ) vecxy' =y ®x. 

Theorem A.4.6. The Jacobian of the transformation E = Y~^ {from E to Y) 
^ l>1 ~ p , where p is the order ofE and Y. 


Proof. From EY = /, we have 


X 外 0 , 


( 31 ) ( K =- £ (盖十=-广(: 



A.5. GRAM-SCHMIDT ORTHOGONALIZATION AND THE 
SOLUTION OF UNEAR EQUATIONS 

A.5.1. Gram-Schmidt Orthogonalization 





■i 





•圆 AUIU 训邏 |,_|画 


may De computationally more etticient or more stable. A Householder matrix 
has the form H = I n - 2aa / , where a ; a = 1, and is orthogonal and symmet- 
ric. Such a matrix H 1 (i.e” a vector a) can be selected so that the first 
column of H X V has 0 J sin all positions except the first, which is positive. The 
next matrix has the form 

(12) Mi L)- 2 («)(° a， )=(i - 

The (/i - l)-component vector a is chosen so that the second column of 
has all 0’s except the first two components, the second being positive. This 



(16) 


K=G, [【1 =G(1)r ， 


and G (l) = U. 

A.5.2. Solution of Linear Equations 











PENDIX B 


1 -^32 1.716 1.791 1.857 1.916 1.971 

1.302 1.359 1.410 1.458 1.501 1.542 

U90 1.232 1.272 1.309 1.344 1.377 

1.133 1.167 1.199 1.229 1.258 1.286 

1.100 1.127 1.154 1.179 1.204 1.228 

1.078 1.101 1.123 1.145 1.167 1.188 

1.063 1.082 1.101 1.121 1.139 1.158 

1.052 1.068 1.085 1.102 1.119 1.135 

1.043 1.058 1.073 1.088 1.102 1.117 

1.037 1.050 1.063 1.076 1.089 1.103 

1.028 1.038 1.048 1.059 1.070 1.081 

1.020 1.027 1.035 1.043 1.052 1.060 

1.012 1.017 1.022 1.028 1.034 1.040 

1.006 1.009 1.011 1.015 1.018 1.021 

1.002 1.002 1.003 1.004 1.006 1.007 

1.000 1.000 1.000 1.000 1.000 1.000 

36.4150 43.7730 50.9985 58.1240 65.1708 72.1532 


Tables 


Likelihood Criterion: Factors C ( p , m , M ) 
> Adjust to where M = n-p + l 


5% Significance Level 


p-3 

10 


.535.241.145.099.072.0561.036.030.025.019.013.008.004.001.000.8693 
11111 11111 111111 8 

.422.174.099.065.046.035.027.022.018.015.01111 
11111 11111 111111 1 

.295.109S.025.018.014.011.009.007.005.003Z.000.000.5916 
11111 11111 111111 2 
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1.001 

1.000 

1.000 

1.000 


TABLE 

Significance Points for the Modified Likelihood Ratio Test 2 = 2 0 


Pr {- 

-21ogAT > x) 

= 0.05 


n 5% 1% 

n 5% 

1 % n 

5 % 1% 

^ 5% 1% 

尸 = 2 

P = 

3 

尸 = 5 

尸 = 6 

2 13.50 19.95 

4 18.8 

25.6 9 

32.5 40.0 

12 40.9 49.0 
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